Dec 7, 2016

Investigating the Aggregation Model


You may have seen the term Aggregation Model (AM) used throughout PPB (Examples: Link #1 and Link #2). I was even going to do a full write-up about it in Mar 2016, but the article disappeared from my hard-drive and I had to leave you with only the hard data. I have re-written the whole article, and hopefully I didn't forget anything from the first one. Enjoy!

The Aggregation Model

The AM is a predictive bracket tool that displays the sum of all seeds for each and every round of the NCAA Tournament. Peter Tiernan from Bracket Science used a metric known as the Mad-o-meter® (not sure if it was trademarked, but giving credit to be safe), and it is an aggregation of the tournament as a whole. The AM breaks down the M-o-M aggregation to the round-level (R32, S16, E8, F4, CG, NC), and its predictive ability works best for some rounds more than others. Up to this point, PPB has focused primarily on the AM for the Elite 8 (E8AM) for this reason, but in this article, I will look at it for all rounds with a greater emphasis on the rounds for which the AM works better.


Elite 8 Aggregation Model

We will start with what is most familiar: the E8AM from the modern era tournament (1985-2016) with the E8 Aggregate Value (AV) calculated in the second column. This chart provides some valuable information on the practical side of the E8AM, but before taking on the practical side of the E8AM, let's start with the theoretical.
  1. An E8AV can take on any value from 12 to 124, with 12 being the result of 1,2,1,2,1,2,1,2 and 124 being the result of 15,16,15,16,15,16,15,16. The chart above shows that the highest values on record are 40, 40, 37, 36, 36, and 35. All other tournaments have been 29 or less. If the range for the E8AV runs from 12-124, then we can throw out a large percentage of possible AVs simply on the basis of impracticality.
  2. We can simplify our theoretical model even more with the results of the chart. For instance, the lowest seed to ever reach the Elite 8 is a 12-seed, and only once has it happened. With data from these 31 tournaments and 4 potential spots for a 12-seed in the Elite 8, one 12-seed in 124 tries equals 0.008% of the time (not even 1%). If we declare the 12-seed as the highest parameter for our theoretical model, then here are the results.
    • The range of values that an AV can take becomes 12 to 92, with 92 being 11,12,11,12,11,12,11,12. Of course, only twice (in 1990 and 2002) has two double-digit seeds made it to the Elite 8. It would be quite a stretch to see exactly 8 double-digit seeds.
    • With only 12 seeds to choose from instead of 16, here is another chart showing the number of possibilities for each AV. Keep in mind, we do not differentiate between regions (no duplicate patterns). As a result, 1,2,1,2,1,2,1,3 and 1,3,1,2,1,2,1,2 do not get counted individually because they are the same variation of each other.
  3. As we can see from the chart, there are a total of 15,876 different possible combinations for an Elite 8 AV. In this chart, the median AV is 52, and each incremental AV in both directions (higher and lower than the median AV) is exactly the same all the way to the extreme high and extreme low. In statistics, this is called a Normal Distribution.
  4. If we follow the guidance from the Elite 8 AM, we can focus exclusively on the combinations that give us 40 or less (left-side of the chart), since the AM does not have an AV above 40 (yet). By ignoring AVs 41 or greater, we eliminate 13,237 possible combinations, leaving us with 2,639 possible combinations.
  5. This may be the most important aspect of the AM and its relationship to the distribution chart of AVs: More possible combinations for a specific AV does not imply a higher possibility of occurrence
    • For example, an AV of 52 has the highest number of possible combinations (536), but not once has it happened. On the other hand, both AVs of 13 and 14 have only one possible combination, and each have happened exactly once. 
    • Another example proving this point, AVs from 30 to 40 have a total of 2,224 possible combinations of the remaining 2,639 (84.27%), yet AVs in this range have only occurred six times in reality. On the flip side of this, AVs from 12 to 19 account for 26 out of the 2,639 possible combinations (0.0099%), yet AVs in this range have also occurred six times. Six occurrences in 32 tournaments calculates to 18.75% of actual results. Both ranges have occurred in reality the same percentage of the time, but the 30-40 range has 84.27% of the theoretical AVs while the 12-19 range only has 0.0099% of the theoretical AVs.
  6. The table below shows the AVs from 12 to 40, with "Qty" being the number of different possible combinations that produce the corresponding AV and "Freq" being the number of times a tournament actually produced the corresponding AV. When more possible combinations imply a higher possibility of occurrence, then you are dealing with a concept known as Law of Large Numbers, but the chart shows that pattern does not exist with the AM.


Sweet 16 Aggregation Model

For a majority of the time, when I reference the AM, I am implying the E8AM because it is the most "effective" one to use. Yet at the beginning of the article, I stated that the AM could be calculated for any round of the tournament. Yes, it can be calculated for any round, but some rounds it would be pointless:
  • The Champion round -- which simply counts the National Champion's seed.
  • The Semi-Final round -- which counts the seeds of the two teams vying for the title.
The Sweet 16 AM (S16AM) is the 2nd most "effective" AM to use. The one attribute that makes it worth talking about is the fact that it has a greater tendency for tournament predictability than any other round -- the E8AM,  the Final 4 AM (F4AM), or the Round of 32 AM (R32AM). In the chart to the left, I have calculated the AM for each of the four rounds (F4, E8, S16, and R32) in each of the modern-era tournaments. In the fifth column noted M-o-M®, I have calculated Peter Tiernan's Madometer® for each of those tournaments. Since the M-o-M® is the aggregation of each round's aggregation model, it should serve as a reliable benchmark against which to compare each of the individual rounds. To do this, I ran a simple correlation of each round's AV to its corresponding year's M-o-M® score. The results are in the table to the right. By a mere 1.6346% points, the S16AM correlation to the M-o-M® outperforms the E8AM correlation to the M-o-M®. The predictability of the S16AM doesn't stop there. When looking at inter-round predictability, the S16AM does a better job at predicting neighboring rounds than any other. In the chart (below and right), I have calculated the ratio of the AM in a specific round to the AM in a different round. For example, the column 8:4 is the E8AM divided by the F4AM. Listed directly above each column heading is the standard deviation of all values in those columns. The inter-round ratios involving the S16AM (32:16 and 16:8) have the lowest standard deviations -- 0.4995 and 0.5051, respectively -- than any others (to no surprise, the next-closest are the inter-round ratios involving the E8AM -- 0.5051 and 0.9589). With very little deviation between the ratios from R32 to S16 and the ratios from S16 to E8, it means bracket predictability should be at its highest when the S16 is accurately predicted. In the simplest of terms, if you can accurately predict the S16 based on the AM, then:
  • You have already correctly predicted 16 of the 32 games in the R32, which adds up to 32 correct picks out of 63 total picks, and 
  • Your odds of predicting a correct E8 are at the maximum because all 8 of the eventual E8 are still available to you since they are still remaining in your S16.
As rosy as these picture just painted seems, it is about to get a little messy. Yes, the S16AM gives you the best chance at total bracket predictability. However, the path to this end is a very spacious path. In the E8AM theoretical analysis, we showed that there were more 15,876 possible seed-combinations when looking at the Elite 8 (this includes 13 thru 16 seeds), and after balancing theory with history, we reduced the total to 2,639 possible E8 seed-pairs. This can't be done with the S16AM. Let's walk through the numbers. 
  1. A S16 seed-pair can have four seeds, one each chosen from four mutually-exclusive and collectively-exhaustive pods. These pods are {1,8,9,16} {4,5,12,13} {3,6,11,14} and {2,7,10,15}. 
  2. Picking one seed from each pod and producing a 4-seed without duplication results in 256 different combinations. The lowest-AV comes from 1,2,3,4 and the highest-AV comes from 13,14,15,16. This would be the result for one regional.
  3. The result of four regions, which gives us a S16, would be 256*256*256*256, equals 4,294,967,296. Yes, that is large and that is correct.
  4. However, this process results in duplicated combinations. For example, I would count the following two combinations as the same:
    • 1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,5
    • 1,2,3,4,1,2,3,4,1,2,3,5,1,2,3,4
    • After removing all instances of these types of duplicates, we are left with 183,181,376 combinations for S16 seed-pairs.
  5. If we apply tournament history to these combinations, we can reduce them further.
    • No tournament has ever had more than 5 double-digit seeds in the S16. If we factor out all combinations that feature 6 or more double-digit seeds, we are left with a total of 40,621,730 combinations of S16 seed-pairs.
    • Every tournament has had at least one 1-seed and at least one 2-seed. If we factor out all combinations that do not have a 1-seed or a 2-seed, we are left with a total of 24,564,689 combinations of S16 seed-pairs.
I could continue this process further and further, but I like to avoid such Sisyphean tasks, especially during the stringent week of Bracket Crunch Time. As you can see, this is why I rarely use any AM other than the E8AM. Even after two practical reductions, we have over 24 million possibilities. Once you get to this point, it would be easier to attempt a prediction of the S16AV and then generate all possible combinations that add up to that specific AV. One thing to keep in mind, the S16AM follows the same normal distribution pattern that the E8AM followed. For example, the median AV of the E8AM was 52 and it had 536 possible combinations (3.38% of all). The median AV of the S16AM is 136, and if it has 3.38% of the 183 million figure cited above, it has a little more than 6 million possible combinations (remember: this includes both practical and impractical combinations). Even if we looked at the practical range of S16AVs (49 to 89), each AV in that range would generate somewhere between 150,000 combinations at the lower-end AVs and 1.3 million combinations at the higher-end AVs of the range. I think I speak for everyone when I say there are too many options to consider, especially when we are trying to find the exact one that gives us our coveted perfect bracket.

Quick run-down of the R32AM and the F4AM
  1. The R32AM
    • Has a minimum AV of  144 and a maximum AV of 400. Realistically, the lowest was 155 in 2000 and the highest was 215 in 2016 (and I blame that on the Selection Committee).
    • Has far more combinations than the S16AM, which makes it less effective.
    • Better tools exist for predicting this round of games, such as the seed-guide, Vegas betting odds, the S16AM, and the 538 Blog's Interactive Bracket.
  2.  The F4AM
    • Has a minimum AV of 4 and a maximum AV of 64. Realistically, it has been as high as 26 in 2011 and the minimum was achieved in 2008.
    • Slightly lower predictability rate (61.94%) than S16 or E8 AMs.
    • While tournament quality can point you in the direction of a potentially high AV (13 or higher) for the F4, it cannot predict (within a narrow window) what the exact AV will be. For example, will it be 15-18 like 2013, 2014 and 2016 or will it be 20+ like 2000, 2006 and 2011.
    • Not very reliable when a specific AV is determined. For example, if tournament team quality suggests that the AV is 10 (which is hard to do given the previous point), this could produce a strong+surprise Final Four (1,1,1,7 like 2015) or it could produce a balanced Final Four (2,2,3,3 hypothetical). Even with the strong+surprise result, you can have 1,1,2,6 like 1987 and 1988. There's no way to be certain until you see all of the combinations and compare them to the field.

Postulate of the Aggregation Model

Essentially, the question we are asking is why does a particular AV occur in a given year? Although there may be many factors at work, I believe the two most important factors, in order, are 1) the quality of teams in the tournament and 2) the ability of the Selection Committee to properly seed these teams (or discern team quality).

Starting with the second factor first, I'm not sure if there is any sure-fire statistical measure of the Committee's ability to discern team quality. For instance, I used KenPom Efficiency Ratings to construct a Seed Curve (Link), which showed under-seeded and over-seeded teams according to their season-long efficiency. In essence, I was using a measure of team quality to grade the Committee's seeding ability. Even less analytical methods could give insight into the mind of the committee.
  • How has the committee's top overall 1-seed performed? (I believe the top overall seed was first recognized by the committee in the 2003 tournament, but I can only confirm as far back as 2005). Since 2005, three have failed to reach the Elite 8 (2006 Duke, 2010 Kansas, and 2011 Ohio State). Keep in mind, this can also be due to under-seeded 8s and 9s that proved to be formidable match-ups against the 1-seed. Improper seeding works both ways.
  • Are the any patterns to seeding? I stated in the previous article that it appeared as if the committee used conference strength (maybe conference rpi) to seed teams in the 2016 tournament. It seemed as if the Big 12 and the Pac-12 received favorable seeding (higher than deserved) and it could have been an influential factor in why many of these teams received early exits from the tournament. (My thoughts/criticisms on the ability of the committee to properly seed the field are detailed in-full in that article, if you are interested.)
As for team quality, there exists a plethora of information, both quantitative and qualitative, in the sporting world. From this blog,
  • The Quality Curve has been a reliable measure of team quality in the tournament. It has been especially reliable when comparing across different years, as shown in Part 2 of What A Perfect Bracket Looks Like (Link #2 at top of page) and as I did for the 2016 tournament in this article
  • In my three-part series called "The Time Line" (Part 1, Part 2, and Part 3), I showed how team quality slowly deteriorated over the modern-era as more-and-more talented basketball players left earlier and earlier for the NBA.
The patterns to a tournament's team quality and the resulting AV produced by that tournament are pretty evident. Through the discernment of both team quality and the selection committee's performance, an projection of a given tournament's AM can reduce tens of thousands of bracket combinations to about 100 or less. This is the end-goal of the AM.

I hope you have enjoyed this foray into one of my favorite bracket tools, and as always, thanks for reading my work.

No comments:

Post a Comment