Jan 15, 2018

Upsets in the Making (Part 3) - Statistical Analysis of Upsets

As I hinted in the January Edition of the QC Analysis, I do think the 2018 tournament will be filled with upsets, and it is why I have focused upon a three-part series on upsets. I also turned it into a three-part series (from one single article) due to the sheer volume of information. Since it has been one month from the last article on upsets, a recap of the series's content is in order:
  • Theoretical Approach - Examining how upsets happen and could potentially happen (UPMs) given the structure of the 64-team/16-seed bracket.
  • Historical Approach - Examining how upsets and UPMs have happened in the 64-team/16-seed bracket, both on a yearly basis and on a seed match-up basis.
In both of those articles, as well as this one, the objective is to create a generic view of upsets, one that ignores bias-inducing elements like team name, W-L records, match-up statistics (efficiency ratings, win percentages, coach PASE), and etc. While other tools will provide secondary insights on the expected number of upsets (QC Analysis) or which match-ups seem likely to produce an upset (round-by-round seed guide or OS/US analysis), these three articles aim to produce a primary framework for understanding how they arise. Now, let's see what statistics can show us about upsets.







Testing the Natural Rate of Upsets

In the Historical Approach article, I presented the chart above and stated I would do more work with it. For a refresher, P stands for the number of UPMs in that round, A stands for the number of Actual Upsets that occurred in that round, % = P/A, and P* stands for UPMs in the R32 created by upsets in the R64 involving 3-6 seeds. As we identified in the Theoretical Approach, the R32 automatically would have 8 UPMs because the 1-seed pods (1,8,9,16) and the 2-seed pods (2,7,10,15) will always result in UPMs in the R32 regardless of which seeds win. Thus, every P* value equals P-8 because the automatic 8 (four 1-seed pods and four 2-seed pods) are being removed from the R32's total.

If the goal is a generic framework for upsets, the first and obvious idea would be a natural rate of upsets. What do I mean by "Natural Rate of Upsets (NRU)?" Simply put, "IF" there is an NRU (an upset will happen x-percent of the time that a UPM presents itself), then more UPMs should result in more actual upsets. For example, (if the NRU exists and) NRU = 33%, then 9 UPMs in R32 should expect 3 upsets (9*0.33) and 12 UPMs should expect 4 upsets (12*0.33). The first path to explore was a line of best-fit, or y=mx+b. By using a statistical method known as least-square regression, I came up with y = 0.2337*x + 0.1322 for the R32 NRU, where x is the variable for number of UPMs, y is the variable for expected upsets, and 0.1322 represents an error correction value (b in the equation). More important, I obtained the value for m (0.2337), which would represent the derived valued for the NRU. This means an upset will happen once every four UPMs, but this seemed low to me, so I investigated further.

The next path to explore involves the correlation coefficient, which quantifies the deviation of the data points from the regression line. Upon doing correlation analysis, this is the problem that I discovered: The data is the result of a process that does not follow the typical linear functional relationship. A typical function means that each value of x will produce one y-value. In this process, one x can produce an "x+1" number of y-values. For example, 12 UPMs (the x-value) can produce 13 possible outcomes (any number from 0 to 12). This is why statistical tests involving linear concepts were producing funky results. (Long Story Short: After realizing the typical version of the correlation coefficient -- the Pearson Product-Moment CC -- was not valid for the data set, I sought out other correlation coefficient calculations --Spearman's Rank-Order CC, Kendall's Tau-b CC, and Somers' delta CC -- that used atypical data sets.) These CC calculations told me that there is no relationship between number of UPMs and Actual Upsets. This is even evident just by looking at the raw data. For example, "13" is one of the higher values that the UPM can take, and on two occasions, 13 UPMs resulted in 0 Actual Upsets in the R32. Likewise, "9" is one of the lower values that the UPM can take, and on one occasion, 9 UPMs resulted in 5 Actual Upsets. These three data points run contrary to any concept of a natural rate of upsets, and since many of the outcomes exhibit this contrary behavior (although not as extreme), it is safe to conclude that the NRU does not exist.

Testing Inter-round Dependencies

We saw in the Theoretical Analysis article that the structure of the bracket automatically generates 8 UPMs in the R32 regardless of the outcomes in the 1- and 2-seed pods in the R64. This raises the idea of inter-round dependencies. Do outcomes in a particular round have any impact on the quantity of UPMs and Actual Upsets in later rounds?

Since there are multiple dependencies being examined, let's begin with the obvious one -- Actual Upsets in one round and their impact on UPMs in the following round. Unlike the last idea, we will actually begin with correlation analysis first. The table to the left shows the correlation coefficient of the prior round's actual upsets to the subsequent round's UPMs. To begin with, correlation analysis will actually work better for this test than the test for the NRU because the values for subsequent UPMs (the y-values) are not the direct results of a process by the prior round's upsets (the x-values) like they were in the test for the NRU. Second of all, the correlation coefficient quantifies the strength of the relationship, or as I stated earlier -- the deviation of the data points from the line of best-fit. If it is positive, then the two variables move in the same direction (direct relationship), and if it is negative, the two variables move in the opposite direction (inverse relationship). Likewise, if the value is closer to 0, it indicates little to no relationship between the two variables, but if the value is closer to 1.0 (or -1.0), it indicates a perfect direct (inverse) relationship. So what do these results tell us?
  1. There is moderate strength in the relationship between the prior round's actual upsets and the UPMs of the round that immediately follows it, with the exception of the F4 round's upsets. To be technical with my wording, if the number of upsets in a round is high, then "the probability of a high number of UPMs" in the very next round is greater, and if the number of upsets in a round is low, then "the probability of a low number of UPMs" in the very next round is greater. This is what is meant by a moderately strong direct relationship. 
  2. I do believe the drop in the correlation at "S16A-E8P" and the return to the 0.40 range at "E8A-F4P" has more to do with the intrinsic structure of upsets and UPMs discovered in the theoretical analysis (which was described by the table to the right) than it has to do weakness in the particular relationship.

Let's move onto the next dependency -- Actual Upsets in one round and their impact on Actual Upsets in the following round. The table below shows these correlations. It shows two moderately strong direct relationships, two weak relationships, and one inverse relationship.
  1. The strength in the "F4A-NCA" relationship has a lot to do with the small range of values that
    F4A can take (0, 1, or 2) and the lack of variation in those results (0 has happened 30 times, 1 has happened twice, and 2 has happened once). If there are 0 upsets in the F4 round, then the NC game most likely (25 out of 30 times) features two teams whose seed differential does not qualify as a UPM, therefore there will not be an upset in the NC game because there isn't a UPM in the NC game. Thus, 0 upsets in F4 most likely means 0 upsets in NC. The same logic applies to 2 upsets in the F4 round, which only happened once, which means two lower-seeded teams facing each other in the NC game whose seed differential does not qualify as a UPM (in that one instance, it was a 7v8 NC game). Thus, 2 upsets in F4 most likely means 0 upsets in NC.
  2. The strength in the "R32A-S16A" relationship is also intuitive in that tournaments of parity (upset-heavy) and tournaments of strength (calm and chalky) exhibit the same characteristics in these rounds. A tournament of parity will have more teams of equal quality, and they will be playing against each other in these two rounds. Thus, if a 7-seed beats a 2-seed in R32, they have good odds of drawing a 3-seed or 11-seed in the S16, both of which are UPMs. If an 8-seed or 9-seed knocks off a 1-seed, they get either or 4-seed or 5-seed, and depending on which they receive, you are looking at another UPM. Likewise, a tournament of strength will have more teams of diverging quality, where top seeds are more likely to win in the R32. Since there is this differential in strength, if all top seeds are advancing, then they face other top seeds in the S16, which doesn't produce a UPM, or they face a much lower-seed like 1v12, 2v11, or 3v10 and prevail without an upset due to the sheer gap in strength.
  3. The inverse relationship of the "R64A-R32A" data, although a relatively weak one, is probably the most intriguing. Essentially, it claims that more upsets in the R64 (6, 7, or 8 upset) will "probably" lead to fewer upsets in the R32 (0, 1, or 2 upsets), and fewer upsets in the R64 (2, 3, or 4 upsets) will "probably" lead to more upsets in the R32 (3, 4, or 5 upsets). If you want to see the R64 and R32 upsets side-by-side, follow this link. The answer may lie in the 5- and 6-seeds, but I'm not entirely definitive about this. Since 5-seeds and 6-seeds are most likely to be upset by their R64 counterparts (12-seeds and 11-seeds), these Cinderellas go to the R32 and most likely face a 4-seed or a 3-seed, who should be stronger than their R64 opponent and who is likely to take them more serious in attempt to avoid being upset like the 5- and 6-seed. That only explains half of the inverse relationship. If there are fewer upsets in the R64, then it means fewer 11-seeds and 12-seeds to wreck havoc in the R32, so the only way to accelerate the upset count in the R32 is for 1- and 2-seeds to lose to 8/9- and 7/10-seeds. Sadly, if this whole theory of mine is right, I don't see how any framework for generic bracket-picking (like we are trying to do in this series) could ever account for this erratic behavior (inverse relationship to be more technical). Thought it is weak, the inverse relationship is present in the data.
I cannot discuss inter-round dependencies without looking further than one subsequent round. The
table to the right shows the correlation analysis for two rounds apart and three rounds apart. While this chart shows a lot of different results, the easiest way to interpret this chart is to get the obvious results out of the way first. The correlation coefficients on the right show that there is very little to no relationship between actual upsets in a particular round and actual upsets in two or three rounds later. (I can actually explain why this lack of a relationship exists, but it is not central to the goal of a generic framework for upsets.) The correlation coefficients on the left show a mixed bag of results for number of upsets in a particular round and number of UPMs in two or three rounds later. I honestly expected the 2- and 3-round correlations to be slightly weaker than the 1-round A-P correlations, but the results are far more divergent than I expected. However, I would say the only one of relevance is the "R32A-E8P" inter-dependency, yet I don't really have a solid explanation for it. It could be a figment of the numbers, where 4s, 5s and 6s in R32A produce 2s and 4s in E8P and 0s, 1s, and 2s in R32A produce 0s and 1s in E8P, or it could be more upsets in R32 lead to more UPMs in E8. I just simply find it strange that this particular inter-dependency is stronger than either the R32A-S16P and S16A-E8P inter-dependencies (above in the single round analysis). Again, I'm not sure of the right explanation, but it is a strong relationship.

On a final note, many of the correlations involving the R32A as the independent variable (written as "R32A-XXXX") show moderately strong correlations with its dependent variable. In fact, the weakest relationship is the "R32A-F4A" with a correlation coefficient value of 0.215815. For this to be a multi-round inter-dependency (which is typically weaker than single-round inter-dependencies) and for this to be an upset-to-upset (XXXA-XXXA) inter-dependency (which is typically weaker than upset-UPM -- labelled "XXXA-XXXP" -- inter-dependencies), a 0.215815 value looks pretty good for a value that should be much closer to zero given those two aspects. I think the overall strength and comparative strength of "R32A-XXXX" correlations speaks volumes about the importance of accurately predicting the quantity of upsets in the R32. By correctly predicting this quantity, the strength in the relationships between it and quantities of upsets and UPMs ("-S16P", "-S16A", "-E8P", and "-F4P") should provide accurate predictions for the later rounds.

Conclusion

I hope this three-part series gave everyone a different look into upsets. Even more, I hope that dividing this information into three different articles did not masquerade my goals for the study. I wanted three different views showing how upsets arise in the bracket and how their frequency can be natural to the bracket itself (not just the result of strength vs parity). I just thought all three subjects together in one document would have been an overwhelming read. Anyways, thanks for reading and my next article will be out on Jan 29, which is the February Edition of the QC Analysis.

No comments:

Post a Comment