Dec 18, 2017

Upests in the Making (Part 2) - Historical Analysis of Upsets

It's time to upset my readers even more with the next article in this 3-part series on Upsets. In Part 1, a theoretical framework was built for analyzing upsets based entirely on probability. It was constructed using all possible combinations of upset-potential match-ups (UPMs) for each of the six rounds of the bracket. It also examined the mathematical underpinnings of seed differentials for each of the six rounds. While Part 1 covered the vast range of what could happen in the tournament (the theoretical perspective), this article will cover what has actually happened in the tournament: The Historical Perspective. This article will examine the historical perspective through two viewpoints: the big picture (the overall counts on upsets and UPMs) and the individual games (a breakdown of the seed-by-seed match-ups). Furthermore, this article will be more descriptive than explanatory -- recording and summarizing the results rather than providing explanations and insights for the results. Let's get to it.


The History of Upsets: The Big Picture

Like the previous article, a picture is worth a thousand words, so let's start with the table to the right. This is an annual count of upsets-by-round for each of the 33 NCAA tournaments since the 1985 expansion to 64 teams. This table may be very familiar to my readers, as I have used it in other articles. Interestingly enough, it shows a few patterns (more like givens) that should catch anyone's eye. 
  • First, the total number of upsets per tournament ranges from 5 to 13, with two outliers in 3 (2007) and 15 (2014), both of which happen to be our models of tournament sanity and tournament chaos, respectively. 
  • Second, the R64 is the only round to have had at least one upset for each and every year of the tournament. All other rounds have had at least one year that produced zero upsets in that particular round. 
  • Third, no year in the tournament has had at least one upset in each and every round. Three years in specific -- 1985, 1988, and 2014 -- have had five out of six rounds with an upset, and those three years saw an 8-seed, 6-seed, and 7-seed win the title for those years, respectively. 
Let's dig a little deeper and point out some not-so-obvious patterns. 
  • First, only four times in tournament history has there been no upsets in the R32 games, and coincidentally, those four tourneys had total upsets of 8, 7, 5, and 5, respectively. This should be obvious from the probability work done in the first article, as the natural mathematical structure of the bracket produces fewer UPMs from higher-seed advancement. 
  • Second, only five times in tournament history has there been more upsets in the R32 than in the R64. This stands in contrast to the natural mathematical structure of the bracket, since earlier rounds will naturally have more UPMs than later rounds because earlier rounds have a higher total number of games being played. Strangely enough, these inverted totals seem to occur in proximity to one another rather than just randomly in time, as two occurred in 1986 and 1990 and the other three occurred in 1999, 2000, and 2004. 
  • Finally, the final three columns (upsets in the E8, F4, and NC) seem to have a block-group pattern to them. From 1985-1992, an upset in at least one of these rounds was probable. From 1993-2005, an upset in at least one of these rounds was non-existent. Then from 2006-2017, an upset in at least one of these rounds was probable again. I could have done the groupings slightly different (such as 1985-1988, 1989-2010, and 2011-2017) and it still presents the same idea, but for the sake of argument (and consistency with other articles), the former arrangement will be our focus.
It is one thing to pick the right upsets when predicting a bracket. It's another thing to know how many games in a given round could qualify as an upset, whether or not the upset actually happens. In Part 1, we established the concept of UPMs, so let's study it in the same fashion as we did for the upsets. The table to the left is an annual count of the round-by-round UPMs. Before we identify the patterns, I first want to refresh the principles we established in Part 1. First, the R64 is not listed because there is always 32 games in the round and always 24 UPMs in that round (the games involving 1-6 seeds). Second, each round has a maximum limit on the number of UPMs that can occur, which is equal to the number of games being played in that round (R32 = 16 games, S16 = 8 games, E8 = 4 games, F4 = 2 games, and NC = 1 game). Simply put, the number of UPMs can never exceed the number of games being played in a given round. Third of all, it was identified that no matter which combination of teams win in the 1-seed and 2-seed pods (a pod is a group of four), the result will always be a UPM in the R32 for the winning teams. Therefore, the R32 has to have at least 8 UPMs in the R32 (4 from the four 1-seed pods and 4 from the four 2-seed pods). All other rounds have a minimum limit of 0.

With the Part 1 refresher out of the way, let's move onto some pattern identification. To point out the obvious first, some rounds have never had a specific outcome in any year. The R32 has never seen either of its extremes: Eight UPMs (all eight from the 1-seed and 2-seed pods) or Sixteen UPMs (all 16 games being UPMs). The S16 round has never seen a year with exactly 0 or 1 UPMs. The E8 round has never seen a year with exactly 3 UPMs (out of four total games in that round). The other obvious thing in this chart has to do with the distribution of counts. The S16 has a pattern almost resembling the standard normal distribution with mean of 5. However, the R32 and E8 look more like positive-skewed distributions. Based entirely on the theoretical perspective in Part 1, I would have assumed all three rounds would have followed the same distribution, either normally distributed about a common mean or skewed in one-direction. The fact that the R32 is positive skewed, then the S16 in normal, and then the E8 goes back to positive-skewed is puzzling.

As it was done with the upset table, let's now look at some other patterns (less reliable for predictive purposes) in the UPM table. First, let's look at the modes (most common counts) for R32 and S16.
  • 12 is the most common count for UPMs in the R32 with ten occurrences over 33 years, and three occurrences in the last five years. The next closest two counts are 11 and 9, both of which have occurred six times each in 33 years, but neither have occurred in the last five years. 
  • 5 is the most common count for UPMs in the S16 with fourteen occurrences over 33 years, but only one in the last eight years. The next most common counts are 4 with six occurrences and 6 with four occurrences over 33 years. Oddly enough, 4 UPMs in the S16 has happened four times in the last seven years, yet the last time 6 UPMs in the S16 happened was 2000. In that time, the years have seen 4, 5, and 7 UPMs, but not a single 6 count.
Another pattern occurs in the F4 and NC rounds, and it looks like a bunching effect. UPMs occur in these rounds as a bunch or group over a given period of time: Either as an "on-pattern" (consecutive years of at least 1 UPM), a "completely off-pattern" (consecutive years of 0 UPMs), or an "on/off-pattern" (an alternating pattern of on and off).
  • In the F4 round, there was four straight years of on (at least 1 UPM), then eleven years of off (zero UPMs), then seven years of on/off, then six years of off, then five straight years of completely on currently. 
  • The NC round behaves the same way: eight years of on/off, then seven years of off, three years of on/off, seven years of off, two years of on, then finally six years of off. 
  • You can see why I consider these bunching patterns to be unreliable when it comes to predictive purposes: There is no way to tell when it switches between the bunches, and when it is in the on/off pattern, there is no way to tell if it is on or off for an specific year. Nonetheless, this pattern is interesting, especially considering the most current five year pattern is "on" for the F4 round but "off" for the NC round.
Before I move onto the individual match-ups viewpoint, I want to make one more final chart that consolidates both the upset chart and the UPM chart. I honestly think it speaks for itself, so I will spare you the long-winded paragraphs of details, but I'm pretty sure I'll do more work with this chart in Part 3 of this series on Upsets. [Note: Since it is close to Christmas time, I made an attempt at cuteness by using Christmas colors to light up this chart.]


The History of Upsets: The Individual Match-Ups

Instead of looking at the overall counts of upsets and UPMs that happen in a given year, we are going to look at upsets and UPMs by the seed match-ups. One important note should be discussed when examining upsets and UPMs from the viewpoint of match-ups: The individual match-up viewpoint is time-independent. In the previous section using the big-picture viewpoint, the examination was time-dependent: Upsets and UPMs were scaled alongside of time. By doing this, we could see upsets and UPMs for each given year, and this allows us to formulate hypotheses as to why one specific year saw more/less/equal upsets than/to another year. For the most part, this is typical of the PPB blog because we are always trying to compare the current season to past seasons in order to improve predictability of the current season. By looking only at the seed match-ups from 33 years of tournaments, we are eliminating time from our scale (time-independence) and making assertions about the likelihood of the specific outcome (a specific seed match-up) based on its relative occurrence to all other possible outcomes.

In hopes that I didn't confuse anyone in the last paragraph (I know I had to re-word it three times before I finally liked its current wording), let's dive right into the data and I can show you what I mean by seed match-ups and time independence. In the chart to the right, these are the seed match-up for the R64 and the upsets in these seed match-ups over 33 years worth of tournaments (starting with 1985 when the tournament field was first set to 64 teams). Since each one of these seed match-ups happens four times in a given year, it makes a total of 132 occurrences for each seed match-up in the R64. Essentially, this table shows us that 2-seeds lose approximately one time for every three times a 3-seed loses (8 upsets compared to 21 upsets) or one time for every six times a 5-seed loses (8 upsets compared to 47 upsets). This chart confirms our expectations of seed strength: Higher seeds are typically better than lower seeds and should be upset less than lower seeds because they have a better match-ups (15-seeds should be weaker than 14-seeds and 12-seeds). Even though this chart shows (time-independent) relative strength between seeds, we can scale these values against time to see how often these upsets occur. By looking at the percentages, we can see that 1-seeds never lose, 2-seeds lose 6% of the time (about one upset every four years), 3-seeds lose 16% of the time (about three upsets every five years), 4-seeds lose 20% of the time (four upsets every five years), and 5-seeds and 6-seeds lose about 35-37% of the time (at least one upset every year). Keep in mind, these percentages do not guarantee an upset in a given span of time, it merely implies the frequency at which they happen.

Since the R64 is a rather simple round to investigate and understand, let's move onto the R32, which has greater variation in its upsets and its UPMs. Again, we will start with a chart (to the left) and see what it has to tell. There are several different ways we can approach this chart, but I think the safest approach would be groups based on frequency of the UPM (column P). Column P% is the occurrence percentage of this UPM seed match-up against all UPM occurrences. It is certainly possible to group these results by either Upset Percentage (three possible groupings) or seed differential (three possible groupings -- 5, 7, and 8). However, I think UPM frequency is the safest approach because the results will have more meaning.
  • The most frequently occurring UPMs are 1v8 (17% of the all R32 UPMs), 1v9 (17% of all R32 UPMs) and 2v7 (21% of all R32 UPMs), and by being the most frequently occurring UPMs, their results should be a very reliable insight into the natural (or expected) outcome of these UPMs. First of all, we can see the 1v8 happens as often as 1v9, which means the 8v9 match-up in the R64 is truly a coin-flip. Second of all, the upset percentage has an inverse relationship to the seed differential, which should be expected. The higher the seed differential (which should imply a greater difference in the quality of the two teams), then the lower the probability of an upset occurring, which is the case with these three UPMs. 1v9 has the highest seed differential of 8, and it has the lowest upset percentage. 1v8 has the next highest seed differential of 7, and it has the next lowest upset percentage. 2v7 has the lowest seed differential of 5 (teams more equal to each other in quality), and it has the highest upset percentage of the group. 
  • The next group, which includes the UPMs 2v10, 3v11, and 4v12, has moderate frequency in occurrence. Their results are somewhat reliable, but probably not as reliable as the first group due to a smaller sample size. The ironic aspect of this group comes from the fact that all three members have a seed differential of 8, and all three have a similar upset percentage in the 33.3-40% range. In a strange twist, the 2v10 match-up is the most prone to upset out of the three match-ups in this group, whereas one would logically assume it should be the 4v12 simply because 4-seeds should be weaker than 2-seeds. After thinking more in-depth about it, I can rationalize one explanation as to why the 2v10 match-up is more prone to upsets: 2-seeds tend to be over-seeded and 10-seeds tend to be under-seeded. More often than not, 1-seeds are your four best all-around teams in the field, which means the remaining teams can be awarded a 2-seed for a variety of other reasons than team quality, such as Strength of Schedule or Quantity of Quality Wins (which doesn't differentiate between home and road). As a result, you get pretenders on the 2-seed line. Under-seeded 10-seeds seems like a good explanation since 7-seeds have played 34 more times than 10-seeds against 2-seeds, yet 7-seeds only have five more wins than 10-seeds against 2-seeds.
  • Finally, the last group is the group of infrequent occurrence. This group contains 15v7, 15v10, 6v14, and 5v13. With so few occurrences (a combined 10% of all R32 UPMs), it is rather difficult to determine if their results are reliable for use in any predictive capacity. For example, the 15v7 match-up (3 occurrences), which has a seed differential of 8, has an upset percentage in the same range as the moderate frequency group. However, the 5v13 and 6v14 match-ups (15 and 16 occurrences respectively), which also have a seed differential of 8, have upset percentages approximately half of the moderate frequency group. Since the 5v13 and 6v14 match-ups have 5 times as many occurrences as the 15v7 match-up, I would believe their larger sample size makes their upset percentage more reliable. Likewise, if you know the team responsible for the only 15v7 upset, you will know it has more to do with a poor seeding by the Selection Committee rather than a surprise upset. For the most part, upsets in the R32 happen at the rates one would logically expect them to happen (on the basis of seed differential), just like the R64.
Conclusion

At this point, you are probably expecting me to carry on the same seed match-up examination into the S16 and E8 rounds. In fact, I probably should do so just for the sake of completion. However, I've been closely following the 2017-18 stats, and I have a hunch that saving the S16/E8 examination will make for a great article later (probably during Crunch Time). If I'm right about this hunch, you will get a pretty juicy article, and if I'm wrong, you will get a patch to this article after the tournament starts. Either way, you will get the information, but I sure hope my gamble pays off. As always, thanks for reading. My next article will be the January Quality Curve, and the final part of this three-part series will follow two weeks after it. I hope to see you then.

Update: S16 and E8 Individual Seed Match-ups

I already know what you are thinking: "This update is long overdue!" First, it was going to be a Crunch Week article that never materialized (more on that later). Second, it was going to be added during the tournament (which was overridden by my IRL matters). Third, it was going to be added in October before this season started, and I decided at the last second that I could tie it in with a new article for the 2018-19 season. Nonetheless, it is long overdue, so I will avoid the catchy intros and long-winded excuses so that we can finally see the S16 and E8 historical seed match-ups. I do have to mention one important detail. In keeping with time-consistency of the article, I have decided to post the S16 and E8 data as it would have appeared last year, which means it will not include 2018's impact on the UPMs and upset percentages.

I will begin with the S16 seed match-ups first, and as usual, I will start with a nice handy table (to the left). Like the R32 seed match-ups, the easiest way to analyze the S16 seed match-up statistics is group-based by frequency of occurrence.

The first group will be the UPMs with the highest frequency of occurrence: 1v5, 1v12 and 2v6. For starters, this group has 87 total occurrences with only 13 total upsets (an upset-percentage of 14.94%). Combined, these three UPMs constitute more than 54% of all the S16 UPMs. Being the group with the most occurrences (another way of saying 'the group with the most data points') and the individual seed match-ups with the most occurrences, the results of this group should be the most reliable for predictive purposes. When we dig deeper into this group (analysis by seed and seed differential), we can see other patterns emerge. All UPMs in this group feature a top-seeded team, and I point this out because top-seeded teams (1- and 2-seeds) typically get geographical- or site-advantage in the regional rounds (S16 and E8) like they do in the pod rounds (R64 and R32). This could be one explanation for the low upset-percentage of the group. Looking by seed differential (and this group only has two differentials: 4 and 11), we see another important pattern. Of 19 attempts, the seed differential of 11 has not produced a single upset, yet out of 68 attempts, the seed differential of 4 has produced all of the thirteen upsets from this group. This result should not come as a surprise since we have postulated that the closer two teams are in team quality (low seed differential), the more likely an upset will happen. All in all, I would speculate that the reliable explanatory factors for this group are seed strength (I'll clarify this term in the next paragraph) and site-advantage.

The second group will be the S16 UPMs of moderate frequency: 2v11, 3v7, 3v10, 4v8, and 6v10. In total, this group boasts 56 occurrences producing 19 upsets (an upset-percentage of 33.93%). Individually, the seed match-ups from this group constitute as many as 14 occurrences and as few as 6 occurrences throughout the history of the tournament. To be honest, I could have easily included the 1v12 match-up from the first group (19 occurrences) and the 1v13 match-up from the following group (4 occurrences) because there is very little difference in their frequency of occurrence from that of this group. However, the reason I did not include them in this group has to do with their other attributes. The first attribute is seed differential. This group features three seed differentials: 4, 7, and 9. Like our postulate states, the lower the seed differential, the higher the upset-percentage, and this group follows the pattern. The three individual match-ups with the seed differential of 4 have the three highest upset-percentages (55.6%, 42.9% and 33.3%) whereas the individual seed match-up with the highest seed differential (11) has the lowest upset-percentage (14.3%). When looking at seed strength, another unique pattern emerges. Where seed differentials may be the same, individual match-ups with a stronger seed tend to have lower upset-percentage. As the higher seed in the match-up gets weaker, the upset-percentage tends to increase. Look at all of the aforementioned UPMs with a seed differential of 4. Those with seed strength (1- and 2-seeds from the first group), they have a lower upset-percentage (17.95% and 20.69%) than their weaker counterparts (3- thru 6-seeds) who have higher upset-percentages (42.86%, 55.56% and 33.33%). This pattern should seem intuitive since 1- and 2-seeds should have site-advantage as well as resume-advantage (they were consistently better throughout the season than their weaker counterparts), and as a result, the higher seeds should be less susceptible to upset. As a reminder though, this group does not have the voluminous frequency of the first group; therefore, their results may not be as reliable in a predictive capacity as the results of the first group.

The third group will be the S16 UPMs of low frequency: 1v13, 3v15, 4v9, 5v9, 7v11, 7v14, 8v12, 8v13, 9v13, and 10v14. In total, this group has 17 occurrences producing 5 upsets. In addition, those 5 upsets come from only three of the ten UPMs, meaning the other seven UPMs are sporting an upset-percentage of 0.00%. Interestingly enough, the upset-percentages also follow our postulate, with the only upsets coming from UPMs with a seed differential of 4. All other seed differentials have not produced an upset, but it is important to note that two UPMs with a seed differential of 4 (9v13 and 10v14) have not produced an upset. Although each of those two UPMs have only one occurrence (not reliable for predictive purposes), seed strength can be one possible explanation. 9- and 10-seeds are usually populated with Power conference teams whereas 13- and 14- seeds are populated with small conference teams.

Now, let's move onto the E8 match-ups, and from the table on the right, you can see that this is going to be a tough one. From 1985 to 2017, only ten upsets have occurred in the E8 round. The most interesting aspect of these ten upsets is their timing. One happened in each of first three years of the tournament, five happened in the six tournaments from 2011-2016 (two of these five in 2011 alone), and the final two outliers happened in 1992 and 2006. The patterns in timing of E8 upsets lead me to believe that structural factors (tournament expansion, NBA draft patterns, bracketing principles, etc.) are more responsible for the results than qualitative factors (seed strength and seed differential). In all, the E8 has a total of 37 UPMs producing 10 upsets. Among the eleven different individual seed match-ups, only five have produced the 10 upsets. When we dig deeper, we see that seed strength and seed differential show very little patterns of consistency (and this may be due to a lack of frequency in occurrence). Of the six individual seed match-ups with at least four occurrences, the largest seed differential (10 by the 1v11 has one of the worst upset-percentages in the table (42.86%). According to our postulate, this seed match-up should have one of the lowest upset-percentages. In fact, this entire round defies our postulate. It also defies our concept of seed strength as all of the upsets have happened to top-seeds (1- and 2-seeds), but we should keep in mind that there is an extremely low sample size for upsets to (3-, 4- and 5-seeds) with only 5 total UPMs involving one of these three seeds. Nonetheless, I don't think can give a suitable explanation in terms of qualitative factors for the bizarre patterns of the E8 round.

As I said at the conclusion of the original article, I thought that saving the S16/E8 data for an article during Crunch Week would make me look like a one-of-a-kind genius prognosticator. I've noticed in many of my models that crazy tournaments follow one of two results: Everything blows up in the early rounds and reverts to the mean by the S16 and E8 (i.e. - 2016) or predictable upsets in the early rounds become surprise upsets in the S16 and E8 (i.e. - 2011 and 2014). The 2018 tournament seemed to follow the latter path, maybe with the added exception of "surprising upsets in the early rounds" rather than "predictable upsets in the early rounds." Either way, the S16 and E8 tend to act like counter-weights to the tournament craziness. In hindsight, I think withholding the article was the smartest decision I could have made. I was extremely hesitant to make any bold predictions (especially bold predictions for later rounds) following the discovery of the tight QC/SC overlay. I was pretty certain that the tightness of the QC/SC match-up would result in fewer than expected upsets in the R64 and R32 (and yes, I was expecting $#!T to hit the fan in those two rounds), but I was uncertain about two things. First, would it equal or even undershoot 2017's R64 and R32 upset counts? For as strong as 2017 was, it still produced four upsets in each of the R64 and R32, and if a 16-seed had not toppled a 1-seed, 2018 in all of its weakness would have matched 2017's R64 upset count of four. Second, how would the fewer-than-expected upsets in the early rounds (R64 and R32) translate to upsets and UPMs in the later rounds (S16, E8, and F4)? I remember tournaments like 2000 very vividly when the R64 produced one upset, then everything gets shattered in the R32 with two 1-seeds falling to two 8-seeds (both 8-seeds would make the F4) and three 2-seeds falling. I thought something similar was a possibility (a small one, but still possible) in 2018, but for the sake of my predictions, I'm relieved it didn't happen that way. In the end, my mid-season article-gamble didn't pay off like I wanted, but at least I didn't turn one mistake into two. As usual, thanks for reading my work and putting up with shenanigans.

No comments:

Post a Comment