Jan 21, 2019

Seed-Group Loss Table, Part 1

If the title of this article seems familiar, then you remember details of my blog way better than I do. I first introduced the idea of a Seed-Group Loss Table (SGLT) in the article on Unorthodox Bracket-Picking Methods. (NOTE: Re-reading that article is not required to understand concepts in this article. However, the instructions on how to construct a SGLT can be found in Step 1 of the Loss-Mapping Technique and examples of what a SGLT looks like can be seen in the bracket images). In the Loss-Mapping Technique, the SGLT did not serve any critical function other than providing a way to double-check my work (making sure the mapping adds up to the correct totals). Also in the LMT, the SGLT was presented in a tournament-dependent construct in which all of the information was relevant to the specific tournament year. In this article, I want to re-arrange the SGLT into a seed-dependent construct in which all of the information is relevant to the seed. The goal in doing this rearrangement is to find seed-specific patterns from losses (quantity of losses, quality of losses, etc.). Thus, this article's sole purpose is to explore the SGLT as a predictive tool. 


Introduction

If you took a quick peak at the Unorthodox Bracket-Picking Methods article (more specifically, the images of brackets showing the tournament-dependent SGLTs), you will have seen the SGLT for 2008-2012, and these were mostly filled with information for the 1- through 4-seeds. I've carried over the same information, combined it with all brackets from 2002 to 2018, organized it by seed, and added a few extra details. Again, the goal is to find seed-specific patterns, so this article is going to be exploratory in nature. Therefore, please do not assume that everything presented in this article is collectively exhaustive. I am certain there are patterns I have simply over-looked. Instead, treat it as a brain-storming endeavor with more avenues waiting to be explored.

SGLT: 1-seeds

I will begin with the SGLT for 1-seeds, and I'll explain what each column means. L is the total losses collectively for all 1-seeds. N is the total losses to non-tournament teams. T is the total losses to teams in the tournament for the given year. L% is the total of losses for 1-seeds collectively as a percentage of total losses by all 1-12 seeds collectively (Total Losses for all seeds 1-12 will be presented in Part 2.) N/L% is the total losses to non-tournament teams as a percentage of total losses by 1-seeds for the given year. R32#, S16#, E8# and R4# are the number of 1-seeds remaining in that particular round for the given year. WTot is the total number of wins for 1-seeds up to the F4 (i.e. - if all four 1-seeds advance to the F4, the win total will be 16 wins: Four 1-seeded teams each winning four games, 4x4=16).


Identifiable Patterns from the 1-seed SGLT:
  • Single-digit losses in the L-column produce at least three 1-seeds in the F4.
  • One loss in the N-column produces zero 1-seeds in the F4 while two losses in the N-column produce one 1-seed in the F4 (maybe indicative of an untested team or soft strength of schedule).
  • 4-7 losses in the N-column produces two 1-seeds in the F4.
  • Nine losses in the N-column produces one 1-seed in the F4 (maybe indicative of weak one-seeds).
  • Less than 20% N/L% produces either one or zero 1-seed in the F4 (likely the same indication as one or two losses in the N-column rule)
  • More than 5% L% produces one 1-seed in the F4 (when all four 1-seeds combine for more than 5% of losses among 1-12 seeds, it seems indicative of weak 1-seeds or stronger lower seeds).
  • Here's the crazy one, so I'm going to break it into two part. If the N/L% is more than 20% and less than 30% and
    • The losses in the T-column are high (13 or more, which indicates a battle-tested 1-seed), then two 1-seeds make the F4.
    • The losses in the T-column are low (9 or less, which indicates an untested 1-seed), then only one 1-seed makes the F4.
SGLT: 2-seeds

Below is the SGLT for 2-seeds, and the same columns have the same meanings for 2-seeds that they had for 1-seeds.


Identifiable Patterns from the 2-seed SGLT:
  • More than 6% L% produces either one or zero 2-seed in the F4
  • More than six losses in the N-column produces either one or zero 2-seed in the F4
  • Less than three losses in the N-column produces two 2-seeds in the F4
  • Here's the crazy one, so follow me for a second. If the N/L% is less than 19% and the L% is greater than 5.9%, then exactly two 2-seeds make the S16. This is probably indicative of worn-down 2-seeds. Another way to look at this concept is losses in the T-column. Of the six years where T-losses were twenty or more, five of the six years result in exactly two 2-seeds making the S16 (All four years fitting the <19% N/L% and >5.9% L% have twenty or more T-losses).
SGLT: 3-seeds

Below is the SGLT for 3-seeds, and the columns still have the same meaning.


Identifiable Patterns from the 3-seed SGLT:
  • An L% less than 6% produces two 3-seeds (indicative of strong 3-seeds).
  • Thirty or more losses in the L-column produces zero 3-seeds in the F4 (parity).
  • Ten or more losses in the N-column produces zero 3-seeds in the F4 (parity).
  • Here's the big one: A L% greater than 7% with a N/L% greater than or equal to 33.33% results in six or less tournament wins collectively among 3-seeds (maybe indicative of general seed-group weakness). This is also the same four teams with nine or more losses in the N-column.
SGLT: 4-seeds

Below is the SGLT for 4-seeds, and the columns still have the same meaning.

Identifiable Patterns from the 4-seed SGLT:
  • A L% less than 7.6% and an N/L% less than 21% produces two 4-seeds in the F4.
  • An N/L% greater than 40% produces four or less tournament wins for 4-seeds collectively.
  • An N/L% no more than twice that of L% produces four 4-seeds in the R32 (all four 4-seeds defeat their 13-seeds opponent).
The Addendum

As I stated in the "To My Readers" section of the blog, I originally planned for this series to have three articles. I thought this section would make for an intriguing second article since it looks at the SGLT data from an macro-perspective rather than a seed-based perspective. After a few weeks of thinking and deliberating, I decided to add this section to this article since it is somewhat similar.

The table below is probably best described as the Seed-group Loss Macro-table because it presents all of the L-columns for all groups from 1-seeds to 12-seeds. In all honesty, I wanted to add more colors to it so that I could color-code my comments, but after seeing what it looked like with an array of colors, it was too confusing for me and default colors were quickly restored.


Before I identify patterns in the SGLM like I did in the seed-based SGLTs, I wanted to bring a few details to the forefront:
  • The split-color last column is the total losses for all twelve seed-groups. It is the value used in the calculation of the L%. For example, 2002 4-seeds had 33 total losses, and 33/367 equals 8.992%, which is found in the L% cell for 2002 in the 4-seed SGLT above. 
  • I am a little concerned about the loss totals for 2002-2006 and those for 2007-2018. 2005 was the year of the NCAA conference realignment (link here). I also think (but not 100% certain) during this time a rule was cleared by the NCAA, allowing teams to play in multiple non-conference tournaments/events. As a result, the number of games played (and as a result, the number of losses) increased as teams played more games in the non-conference, conference, and potentially the conference tournament. As you can see, the ranges of the gray years are much different that the ranges in the orange years. 2006 could possibly go in either range since it happened after the 2005 realignment and it has much in common with 2010 in terms of upset counts and M-o-M rating. This is why I think L% is a much more valid and comparable statistic than raw L stat by itself. Even then, the difference in loss totals is clear and it may have an adverse impact on results and model applicability in the SGLT Part 2 article.
With that being stated, let's move onto the pattern analysis.


Here are the noticeable patterns:
  • In three out of four years that two 2-seeds made the F4, 2-seeds had loss totals within three or less of the 1-seed loss totals of that same year (2009 actually produces a false positive).
  • In the only year (2003) that two 3-seeds made the F4, 3-seeds had a loss total within three losses of the 1-seed loss total for that year.
  • In the only year (2010) that two 5-seeds made the F4, the 5-seed loss total was less than the 2-seed, 3-seed and 4-seed loss totals of that year.
  • As a complement to the above three, the seed-group with the second-lowest loss total among 1- thru 11-seeds in a year typically results in a F4 appearance, and for the false positives:
    • 2008 and 2015 - Strength of the 1-seed loss totals (explained in 1-seed SGLT).
    • 2013, 2014, and 2018 - The flat-ascension rule (explained next).
  • If you can visually plot the loss totals on a xy-graph in your mind and see a flat-ascending curve, then you are probably looking at a weak-quality year in which very few rules/patterns will apply. If I had to put values on it, if loss totals in the 4- and 5-seeds are less than twice the 1-seed loss total OR if loss totals in the 8- and 9-seeds are less than thrice the 1-seed loss totals, then that year is probably a weak-quality year.
  • Darwin Rule: Assuming all of the above rules hold, then give attention to a seed-group's competition. At the line-level, 5 and 12, 6 and 11, 7 and 10, 8 and 9 are competitor's for the same spot. Historically strong competition (lower than average or abnormally low loss totals) can threaten a seed-group's chances to advance since they are both competing for the same spot. Other competition includes pod-level competitors (1-8-9, 4-5-12, 3-6-11, and 2-7-10) and cubic-level competitors (1-4-5-8-9-12 and 2-3-6-7-10-11) can influence a seed group's probability of advancing to the E8 or F4. This is a really complex rule in application, but here is an example of this rule's application on 8-seeds in 2011:
    • 8-seeds in 2011 are relatively low historically.
    • The spread between them and their line-enemy 9-seeds is the 3rd highest historically (2008 and 2015 are ruled out by the steep ascending curve rule).
    • Tripling the 1-seed's loss total is greater than the loss total of the 2011 8-seeds.
    • 4-seeds and 5-seeds are historically high, so again little resistance if an 8-seed squeezes past the 1-seed.
    • The other cubic is likely to produce a 2-seed or 3-seed, both of which are on-average quality historically and equally matched (should wear each other down).
    • Combine all of this, you will most likely get the highest probability for an 8-seed to reach the F4, which is actually what happened. The "conditions" were right for an 8-seed run in 2011, and 2014's 8-seeds aren't far from this model either.

Conclusion

From this bombardment of data, I think its best to conclude with a quick summary of this article before casting away with my usual outro.
  1. The goal of this article is an experimental incursion into a potential bracket picking tool. Nothing more, nothing less!
  2. The hypothesis behind this tool comes from an assumption of under-appreciation by the Selection Committee for the information provided by losses. The Committee usually invites and ranks teams according to whom they beat without regard to whom they lost. I think these losses tell more about team quality than is being acknowledged.
  3. Theoretically, I would assume (but it is unclear if these assumptions are true) that 
    1. A low L% means a seed-group is relatively stronger than other seed-groups.
    2. A low N/L% implies good-quality teams within that seed-group.
Most important (and I probably should list this as the fourth bullet point), just know that I'm always trying something new and I'm not shy about sharing it with you. In Part 2 of this series (Link), I discuss applicable methods to apply this voluminous amount of data. As always, thanks for reading, and Feb 4 will be the February Edition of the Quality Curve Analysis.

No comments:

Post a Comment