Feb 22, 2022

Another Overseed/Underseed Model

If you are a consistent reader of my work, you may have an inclination that I like Overseed/Underseed models. I believe many upsets are just mis-seedings and mis-evaluations by the Selection Committee. If I can devise models that create alternative methods of evaluating team rankings or even a method that evaluates the "perception of team rankings," then I can find these upsets before they actually become upsets.

AP-based OS/US Model

This particular model may be construed as both an alternative method of evaluating team rankings and a method to evaluate the perception of team rankings. I use the AP Poll weekly rankings, calculate an average rank across all pre-tourney weeks, seed teams based on their average rank, and use the difference between this seed and the actual seed as an over-seed/under-seed metric.

Methodology

  1. Any team that receives a ranking in the AP poll gets an average ranking in this model.
    1. Not every tournament team receives an AP-ranking during the season. Not every team that receives an AP-ranking makes the tourney.
    2. Some teams start the season ranked and fade out, other teams enter later and stay, and others come-and-go on a weekly basis. As a result, this usually generates 40+ teams for this model. (40 teams = 1- thru 10-seeds).
  2. Average Rank = Sum of All Rankings / Number of Weeks with an AP-ranking
    1. Hypothetically, if a team gets ranked #20 in the opening week, loses a game and is removed from the next week's poll, they have one week at rank 20. 
    2. If this hypothetical team never re-enters the AP poll, the value of their Average Rank is 20.
  3. Once the Average Rank for all teams that received at least one AP-ranking is calculated, they are sorted in ascending order and seeds are assigned.
    1. The four teams with the four lowest Average Rank get 1-seeds, the next four get 2-seeds, the next four get 3-seeds, and so on until all teams with an Average Rank value get a seed.
    2. No season under analysis in this article has produced more than 48 teams, which is equivalent to 12-seeds. I am curious as to what that scenario would mean for this model.
  4. Diff = AVGRK - SEED
    1. A negative Diff-value means the team is under-seeded while a positive Diff-value means the team in over-seeded.
    2. PPB's working assumption has always been: Under-seeded teams are likely to pull off upsets while over-seeded teams are likely to fall victim to upsets.

Results

I ran two sets of results that I will explain and use the 2021 season as the example.

  1. Diff is color-coded in pairs. A green pair indicates the R64 match-up was correctly predicted by the OSUS model, and a red pair indicates the R64 match-up was incorrectly predicted by the OSUS model. The team with the lower "Diff" value should win because negative-Diff implies under-seeded and positive-Diff implies over-seeded.
  2. W = Actual Wins, E(W) = Expected Wins, Vs(E) = Performance Vs Expected Wins
    1. If the Vs(E) value is negative, it means the team failed to match/exceed the expected number of wins for its actual seed. If the Vs(E) value is positive, it means the team matched/exceeded the expected number of wins for its actual seed.
    2. The Vs(E) column is color-coded to illuminate the success or failure of the model to predict based on seed-expectations (Green = Success and Red = Failure). If the "Diff" value is negative, then the Vs(E) column should be 0 or greater. If the "Diff" value is positive, then the Vs(E) column should be negative. The theory is under-seeds (negative-Diff) should out-perform expectations and over-seeds (positive-Diff) should fail them.

Let's Look at 2021



As for the R64 match-up predictor, this model went 3-2 (60% Success Rate) for 2021. As for the Vs(E) predictor, this model went 15-11-1 (The yellow is the tie, which happens when the bracket forces one team to win when both/neither team should win.)

Here's 2019



The R64 Match-up Predictor: 4-2

The Vs(E) Predictor: 11-17-1

Here's 2018



The R64 Match-up Predictor: 0-1

The Vs(E) Predictor: 10-14-1

I would like to point out one thing about the 2018 results. There are 20 teams with positive-Diff values compared to 5 teams with negative-Diff values, suggesting a lot of over-seeded teams. No other year had that imbalance, and it's probably why that year continued to dump on itself round after round after round.

Let's look at 2017



The R64 Match-up Predictor: 4-1

The Vs(E) Predictor:18-9-2

Finally, we have 2016



The R64 Match-up Predictor: 5-1

The Vs(E) Predictor: 20-7

Practical Application

This is definitely the most difficult section of the article: When and how to properly use this model. As for the R64 Match-up predictor, every year except 2018 saw 60%+ predictive rates, and there is an adequate explanation as to why 2018 flipped the script. As for the Vs(E) predictor, the results were all over the place with 2016 and 2017 were moderately successful years (66.7% and 74.1% success rates), 2018 and 2019 moderately unsuccessful (41.7% and 39.3% success rates), and 2021 barely above water with a 57.7%.

Methodological Changes -- The 2021 Results offers one possible insight: Removal of weeks. I don't follow media polls at all. It wasn't until I did this project that I discovered 2021 had 17 weeks of rankings while the other four years had 19. If there was a scientific way to determine which weeks to remove (such as remove the best result and worst result), that could potentially eliminate extremes and outliers. Another possible change is extended rankings. The AP poll usually lists teams receiving votes, just not enough votes to make the Top 25. If these were assigned their appropriate AP-rankings of 26, 27, 28,.....,X, maybe that could pull down some of the AVGRK values while adding in more teams to the entire list.

Implementation -- I've always wanted a model that "paths the bracket". In simple terms, it picks the source of the result. For example, does the F4 team come from the upper-octet [1,4,5,8,9,12,13,16] or the lower-octet [2,3,6,7,10,11,14,15]. Likewise, does the S16 team come from the upper-pair or the lower-pair. For the most part, I've found OSUS model to be cleverly suited to attempt this. For example, look at the 2016 Results. In each of the four regions, one octet seems to be mostly under-seeded and the other octet mostly over-seeded. By our OSUS theory, the under-seeded octet should be the source of the F4 winner.

I'd probably put this model in Model Purgatory, if such a place existed. It's not perfect enough to go to Heaven, but it's neither entirely useless nor downright bad enough to go to hell. Until I have more time and patience to make predictive-based refinements to it, it will have to be a back-up/tie-breaker model.

No comments:

Post a Comment