Dec 4, 2017

Upests in the Making (Part 1) - Theoretical Analysis of Upsets

I hope the two teasers that I left helped you figure out the subject of this article: Upsets. It doesn't take a rocket scientist (and I certainly am not one of them) to know that upsets are the biggest part of the tournament experience. In fact, a well-rounded understanding of upsets can make a difference of 3-15 picks in your bracket compared to your competitor's bracket. To gain that top-level insight into upsets, this article will examine upsets from the first of three different perspectives: The Theoretical Perspective.



Before we jump right into the sandbox, we must first lay down the groundwork to get everyone on the same page. Keeping with prior work, a match-up only qualifies as an upset if the lower-seed wins the game and the difference between the seeds of the two playing teams is four or greater. Also, this is the second time I have done any writing on the subject of upsets (the first is here under the heading "Upset City"), but this article (and the other two parts) will be far more in-depth than the few paragraphs I dedicated to upsets in the first article. With all of that covered, let's get into the most upsetting article I have ever written.

Theoretical Analysis

The entire purpose of the theoretical perspective is to build upon the upset definition using a probabilistic model. It will most likely be the least useful of the three perspectives when it comes to making picks in your bracket, but this perspective will provide a firm foundation for our understanding of upsets and will complement the other two perspectives.

Since a picture is worth a thousand words, let's start with the chart to the right. This is a round-by-round description of games with upset-potential in the tournament, where G is the number of tournament games in the specific round, C is the total number of combinations that can happen for match-ups in the specific round, P is the number of upset-potential match-ups (UPMs) where the seed differential is four or greater, and % is the percentage of the combinations that have upset-potential (P divided by C). The first four rounds (R64 thru E8) are fairly common sense. As the number of games are cut in half with each successive round, the number of match-up combinations are doubled. Since teams never see the same seed-number in their region (R64 thru E8 games), these values should be a power of two (2^n). In the F4 and NC games, where a team can face another of the same seed-number, the math gets a little different since duplicates can occur (upper-region #1 versus lower-region #2 is counted the same as upper-region #2 versus lower-region #1). It becomes the sum of the sequence of numbers 1 through 16, and in the case of the F4 round, this sum is doubled since two games occur in this round. The other interesting insight from this chart involves the %-column, where the first three rounds have 75% of their combinations that could result in a UPM and the final three rounds have 56-57% of their combinations that could result in a UPM. It is rather unusual that the E8 round would follow the games/combination inverse-relationship of R64, R32 and S16, yet it follows the % pattern of the F4 and NC. I think we will understand why when we dig a little deeper.

Breaking down the math of the bracket

So, let's take that deeper look at each round, starting with the R64 which is the simplest. With 6 different match-ups (1v16, 2v15, 3v14, 4v13, 5v12, and 6v11) and each of these occurring four times, there are 24 potential upsets (6x4) in the first round alone. Since the 7v10 and 8v9 match-ups do not have a seed differential of four or greater, there are 8 games in the R64 that do not qualify as a UPM.

R32 Math
Moving onto the R32, it gets a little more complex, but nothing we can't handle. Each of the 4-team pods (1-16/8-9; 4-13/5-12; 3-14/6-11; and 2-15/7-10) have four different possible combinations. For example, the 1-seed pod can produce 1v8, 1v9, 8v16, and 9v16. With 4 pods each producing 4 different combinations for each of the 4 regions, we get 64 possible combinations (4x4x4) for R32. Looking at the 1-seed and 2-seed pods, something mathematically interesting takes place: No matter which combinations of teams in both of those pods wins in the R64, the match-up in the R32 will always be a UPM. From the example a few lines above, you can see that all four combinations in the 1-seed pod have a seed differential of 4 or greater, and this is true for the 2-seed pod as well (2v7, 2v10, 7v15, 10v15). In the 3-seed and 4-seed pod, only 2 of the 4 possible combinations for each pod will result in a UPM in R32 (4v12 and 5v13; 3v11 and 6v14). Thus, for each region, there are two games that guarantee a UPM in the R32 and two games with a 50% chance at UPM, which produces 48 total UPMs (8x4 + 8x2) in the R32.

S16 Math
For the S16 round, the same patterns continue except it uses octets instead of pods. The 1-seed octet includes 1-8-9-16/4-5-12-13 and the 2-seed octet includes 2-7-10-15/3-6-11-14. Since each member of one-half of the octet can be paired with one member of the other-half, the result is 16 different combinations for each octet. With two octets per region (a 1-seed octet and a 2-seed octet) and four regions per bract, the result is 128 different combinations (16x2x4) for the S16. Now, let's rearrange the octet each octet numerically (1-4-5-8-9-12-13-16 and 2-3-6-7-10-11-14-15) to see another interesting mathematical phenomenon. Every set (a group of three) of sequential numbers has a neighbor with a differential of 1 and another neighbor with a differential of 3. For example, the set 1-4-5 or 4-5-8 or 5-8-9 all follow this pattern, and the same is true for the 2-seed octet. Thus, every outer-pair of these sets will have a seed differential of 4 (our defining value for UPMs), which is the sum of the two seed differences between neighbors (3+1). Going back to the octet-split, this mathematical phenomenon really manifests itself. Every individual member of the octet-half -- when paired with another individual of the other octet-half -- has one pair with a seed differential of 1 and all other pairs have a seed differential of 4 or more. With 3 out of every 4 (75%) pairings meeting the requirements of the UPM, the 16 different combinations in a S16-octet will result in 12 of these combinations being a UPM (or 96 out of the total 128, 75%).

E8 Math
The E8 round essentially is a match-up that pairs the two S-16 octets. The math for determining the combinations is still the same. With each and every member of the 1-seed octet being paired with each and every member of the 2-seed octet, the result is 64 combinations (8x8) per regional E8 game, and with 1 game per region and four regions, the result is 256 combinations (64x1x4) for the whole bracket. As you can see, the E8 still follows the same pattern as the R64, R32 and S16 for finding combinations, but it was also pointed out that the E8 more closely follows the pattern of the F4 and NC for finding the UPMs, and here is the simple explanation. When looking at the pattern for S16 sets, every set followed the 3-&-1 neighbor pattern. Now in the E8, every member of that set is being paired with a number that fills the gap that makes the 3-gap neighbor. For example, the 4 in the 1-4-5 set is now being paired with the 2 and the 3, both of which fill the gap (of 3) between the 1 and 4. Likewise, in the 4-5-8, the 4 is being paired with the 6 and the 7, both of which fill the gap (of 3) between the 5 and 8. Since we know the gap is 3, then anything being paired with something that fills in the gap will create a seed differential less than 3, which doesn't qualify as a UPM. As we saw, 4 paired with the gaps from the other octet gets seed differentials of 1 (4-3), 2 (4-2 and 6-4), and 3 (7-4), none of which qualify for the UPM. Thus, four of its eight pairs (50%) are not UPMs. Since 1 and 16 only belong to a single 3-&-1 set (1-4-5) and (12-13-16), it only has one gap (of 3) with which it can be paired, meaning only two of its eight pairs (25%) are not UPMs. Since 6 of the 8 seeds in the 1-seed octet only has 50% of its pairs that qualify as a UPM, the total number of UPMs for the E8 approaches 50% rather than 75%.

F4 & NC Math
The F4 and the NC rounds calculate the same with the only difference being there are two F4 games and one NC game, so the F4 round calculations should be double that of the NC round. In these two rounds, any seed can be paired with any other seed, with an occasional duplicate showing up. Let's start with the 1-seed pairings. It can be paired with any other seed, so this results in 16 combinations. The 2-seed can be paired with any other seed, but we do not count the 1-seed since it is already being counted in the 1-seed pairings (1v2), and this results in 15 combinations. Each new seed-pairing will result in one less combination, so 14 combinations for the 3-seed, 13 combinations for the 4 seed, and so forth until the 16-seed pairing which has 1 combination (16v16). The sum of numbers 1 through 16 equals 136, which is the total for the NC combinations, and if we double this value, we get 272, which is the total for the NC combinations. The math for calculating the UPMs is exactly the same. For the 16 1-seed pairings, 12 qualify as UPMs, or all 1-seed pairings with a 5-seed or higher. For the 15 2-seed pairings, 11 qualify as UPMs, or all 2-seed pairings with a 6-seed or higher. It ends at the 12-seed pairings where only 1 of the 4 qualifies as a UPM, which is the 12v16 pairing. None of the 13- through 16-seed pairings qualify as UPMs since all of their seed differentials are 3 or less. The total number of UPMs is simply the sum of the numbers 1 through 12 (78), which is the individual totals for each seed pairing (1-seed = 12, 2-seed = 11, ......, 12-seed = 1). Since there is only one NC game, 78 is the total UPMs for the NC round, and since there are two F4 games, 156 (78x2) is the total UPMs for the F4 round.

Now that we understand the probabilities and combinations of bracket seed-pairings, this will conclude the theoretical model for upsets. I do appreciate you reading my work, and the second part of this three-part series on upsets will be published on Dec 18. I have taken the liberty to do one final task in this article (in "Reference" section below). I thought it would be useful for this article to have a reference section for the seed differentials for UPMs for each round. Over the summer, I was developing an upset-picking tool that would use a seed-differential correction factor, so having all of the round-by-round seed-differentials in one place was convenient for that project. If an idea like that peaks your interest, the reference section below is a very good starting point. Again, thanks for reading and I'll see you in two weeks.

REFERENCE

R64 Seed Differentials
  • 15 (16-1)
  • 13 (15-2)
  • 11 (14-3)
  • 9   (13-4)
  • 7   (12-5)
  • 5   (11-6)
R32 Seed Differentials
  • 8 (9-1, 16-8, 15-7, 10-2, 11-3, 14-6, 13-5, 12-4)
  • 7 (8-1, 16-9)
  • 5 (15-10, 7-2)
S16 Seed Differentials
  • 12 (13-1, 16-4, 14-2, 15-3)
  • 11 (12-1, 16-5)
  • 9   (11-2, 15-6)
  • 7   (10-3, 14-7)
  • 5   (9-4, 13-8)
  • 4   (5-1, 8-4, 9-5, 12-8, 13-9, 16-12, 6-2, 7-3, 10-6, 11-7, 14-10, 15-11)
E8 Seed Differentials
  • 14 (15-1, 16-2)
  • 13 (14-1, 16-3)
  • 11 (13-2, 15-4)
  • 10 (11-1, 12-2, 13-3, 14-4, 15-5, 16-6)
  • 9   (10-1, 12-3, 14-5, 16-7)
  • 7   (9-2, 11-4, 13-6, 15-8)
  • 6   (7-1, 8-2, 9-3, 10-4, 11-5, 12-6, 13-7, 14-8, 15-9, 16-10)
  • 5   (6-1, 8-3, 10-5, 12-7, 14-9, 16-11)

No comments:

Post a Comment