Mar 6, 2023

Pre-BCW Teaser

After a few attempts at methodological improvements and a few rounds of deliberation, I've decided that a full review of and report on the SGLT will be near-impossible to complete in the two weeks before Bracket Crunch Week. Instead, I'll do a quick-hitter article, which ties a few loose ends together while giving me the time and flexibility to get ahead of BCW. I'll start with a status-check of the SGLT, followed by some teaser work for BCW.

Seed-Group Loss Table

This has been a pet-project of mine for quite some time. I kept the data for it since 2006 on bracket worksheets, but I didn't put it together into a predictive model until 2019 (Part 1 and Part 2). Then, 2020 happened, and 2021 wasn't any better for a loss-based model given how many games were cancelled that season. Although I spent this time converting it from pen-and-paper to digital spreadsheet, yours truly discovered a few typos from the process and also misapplied the tool. Nonetheless, it did well at identifying seed counts, but win totals for seeds was a different story. So, where does it stand today?

First, I can say that all data up to the 2022 tournament is correct and in spreadsheet format. So, everything from this point forward is methodological testing. Second, the model for 2023 is best-used as a tie-breaking model. In simple terms, if a prediction is uncertain or two reliable models predict opposing outcomes, using it is better than nothing or a coin flip. Also, it is better situation for predicting F4 and E8 targets rather than R32 and S16 targets. In 2022, I erroneously applied it to the latter and the results were disastrous. Finally, the matching method is currently the preferred method, as I've been unable to do any testing with the regression method. It requires a little bit of logic (and validity-testing as well), but its results are the only ones I would even remotely consider to be reliable. All in all, if I post the results of the SGLT to the final article, it will be a Wed night or Thurs morning submission (plus I'd probably be waiting on the results of the play-in games again).

BCW Teasers

First, let's start with an update to the Experienced Talent Model. At the beginning of every season, we use an experience estimator for the current season (every player gets +1 added to their previous season's score). This produces a value which is akin to a maximum ET score. As the season progresses, experience scores are reevaluated to reflect the true experience gained from the current season. This revision almost guarantees lower ET scores across the board. First, let's look at the changes.

First, there are a lot of swings in the rankings. For the teams that moved up, their ETM score probably didn't change. It had more to do with the ETM scores of neighboring teams falling. Falls can be the result of many things: Injuries, lack of playing time, not even playing at all, or -- in one case -- a player being assigned to the wrong team (Detroit is a good example). The only injury unaccounted by the model is UCLA's Jaylen Clark, one because it is new (after I finished the revised scoring on Wednesday) and two, it won't send UCLA too far down the list. Second, my pre-season predictions weren't up to par, but then again, this season hasn't been up to par either (spoiler). In my analysis, I did say CONN, MIST and ORE are more likely to make it, and the rest of the question marks are likely to miss. These are looking highly likely. All of my missed predictions are teams that fit the other profile: They should make it but now look likely to miss (UNC, NOVA, DAME, and FLST).

The real question: Why does this model matter for March? Let's take a look.



The ETM Top 25 has been a great tool for identifying sleepers.

  • 2016: 10-seed SYR ranked as the 4th overall ET team and ran to the F4.
  • 2017: 4-seed FLA to the E8 and 7-seed MICH to the S16, which is good for a low-upset year.
  • 2018: 7-seed TXAM to the S16
  • 2019: 5-seed AUB to the F4 and 12-seed ORE to the S16, both good for a low-upset year.
  • 2021: 11-seed UCLA to the F4 as the 9th-ranked ET team, and 6-seed USC to the E8.
  • 2022: 8-seed UNC to the NR game, 10-seed MIA to the E8, and 11-seed MICH to the S16.

Without the actual seeds for 2023 revealed, I'll have to use the bracketology projections. This means our sleeper pool contains MIST, UK, DUKE, and USC (the first three as the most likely candidates to make at least an E8 run). ORE, AZST, TXTC and NCST could also join the sleeper pool if they get invited to the tourney.

That's all I'm going to spoil for the ETM, so let's move onto one more tool in our toolbox. The Conference-based OS/US model is in for a doozy this year, which is why I'm getting an early start on it. The B10 is the biggest problem for this model. 2nd place in the B10 has a 12-8 record while 12th place has a 9-11 record. With 11 teams separated by only 3 games, their seeds have to be within 1-2 seed lines of each other or else the model will predict a ton of OS/US possibilities. The ACC and the SEC aren't much better. It could be one of those years like 2016 and 2018 where the model produces too many contingencies and the results of the model become self-contradictory. I called this situation an over-load (OS/US models can do this), and it makes the model less accurate in those years.

Before I start getting any more wordy for a quick-hitter article, this should hold us over for the next six days. I'm honestly thinking this year might be a nightmare. Until Selection Sunday, thanks for reading my work and I'll see you then.

No comments:

Post a Comment