As promised, I want to look further into the January QC and see if it holds any insights into the tournament as well as clues to what the Feb and Mar QCs will do. I want to start with the big picture and work my ways inwards.
This is the Kenpom Curve, not to be confused with the Quality Curve (which is derived by the Top 50 teams in the Kenpom Ratings). It is a plot of the Kenpom Ratings of all 358 teams in college basketball. For the most part, every Kenpom Curve looks like this, with differences in steepness/flatness due to the parity or lack thereof in team quality. On the chart, I drew a vertical line at the 50th team, which marks the boundary of the Jan QC. In short, our QC is the first 1/7th of the Kenpom Curve. If you are immersed in numbers, statistics, and charts like myself, this graph should set off alarm bells in your head right now. Why, you might ask???
When I saw this chart, I immediately recognized the cumulative normal distribution function (except it is rotated 90-degrees counter-clockwise). The means that the underlying variable being measured (team quality) follows a normal distribution (yes, it's quite possible that the Kenpom ratings methodology was designed to produce values that follow this structure, but it also seems inconceivable to me that a predictive system would ever want to produce such a rigid/inflexible outcome). For example, the normal distribution requires that the mean = the median. The Kenpom Curve produces no such result. The median team (Team #179) should produce a Adjusted Efficiency Margin value of zero, but on the chart, you can see by the smaller blue line in the center that Team #179 produces an AEM value below zero. Thus, the Kenpom Curve produces a left-skew distribution instead of a normal. When I break the curve into ranges, you'll see this result even clearer.
So I've broken the Kenpom Curve into ranges of 3 points and the quantity of teams in the respective ranges are atop the bar. As you can see, the highest bar is the "-0 to -3" range of AEM values. If you add up the quantities of all teams in ranges with "positive AEM values," you get 160 teams. This is still 19 teams short of the median, so you have to move 19 teams into the negative to identify the median team. This is one value that I follow very closely. For the most part, a negative AEM value for the median team is better for overall team quality because it implies a steeper curve (more below-average teams weighing down the total average team quality). A word of caution though: What happens at the median is never fully reflective of what is happening at the Top 50 (the QC), it is just another piece of information to validate/invalidate the hypothesis.
Inflection Points on the Quality Curve
Another important piece of information about the QC (as well as the Kenpom Curve) is inflection points. In my first year of PPB, I identified an inflection point in the Jan QC Analysis, and as I thought would happen, the Feb QC rotated/pivoted along that inflection point. Even though I've spent the last five years searching for and studying them to no avail, the 2016 QC is the only time in QC analysis that I have seen and predicted a phenomenon based on inflection points. In mathematics, inflection points are where curves change their concavity (the median would be an example of this definition). However, I will use the term to describe abrupt changes of quality. In some parts of the QC, it will scroll past ten to fifteen teams and only experience a 1pt decline in Adjusted Efficiency Margin. Then in other parts, it will scroll past one or two teams and experience a 1.0-1.5pt decline in AEM. The latter example portrays inflection points: Sudden or abrupt deviations in team quality. The chart below shows the locations on the Kenpom Curve where there are sharp changes in quality. In case it isn't discernible on your device, the locations of the inflection points are 2, 20, 45, 97, and 150 (this should cover all of the potential ranges for tournament teams.
It is my theory that these inflection points provide insights into upsets and later-round outcomes. In the 2016 prediction, the rotation/pivot occurred along the four-seed range. If not for some glaring mis-seedings by the Committee (MTSU as a 15-seed and STFA as a 14-seed plus over-seeds of 2-seeds XAV and 3-seed UTAH) and a last-second injury bug (4-seed CAL), the 1- thru 4-seeds would have been money in the bank for the first two rounds that year. In the 2017 tournament, the front of the curve stayed elevated until the 7-seed range and then a precipitous drop-off occurred thereafter. Two 7-seeds pulled off upsets of 2-seeds with one of them advancing to the F4, and another team with 6-seed quality was given an 8-seed in the tournament and pulled off the only 8-over-1 upset. Again, the inflection point identified the separation in quality and the Selection Committee boggled the seedings (and the 2017 site locations), both producing upsets and later-round outcomes that really shouldn't have been surprises at all.
As a thought experiment, I've always tried to create a hierarchical ranking structure using inflection points, but it hasn't been as fruitful as I've imagined it could be. This theoretical ranking hierarchy would imply teams in the same group are equivalent, teams in the next group would be equivalent among each other, but probabilistically inferior to the previous group, and so on for each inflection- point grouping. To visualize this hierarchy, imagine the Kenpom Curve from the previous section, except it is subdivided by these inflection points instead of the (arbitrarily/non-scientifically chosen) 3.0-point equidistant ranges. Last year featured a runaway group at the very top (GONZ, HOU, BAY) that achieved F4 status, followed by a narrow-banded group (MICH, ILL, IOWA, OHST, and ALA) who failed to make seed-expectations, followed by a large-banded group (#9-#37 teams) whose results followed the patterns of over-seeds and under-seeds. Predicting how each subdivision is going to perform in the tournament is a monumental task in-and-of itself. However, this ranking hierarchy with its generalized and simplistic structure produced a high-probability model of the 2021 tournament (yes, there was some egregious misses too: The 4-seed group as a whole plus the 7v10 CONN/UMD match-up). To say the least, the 2022 subdivisions are far more spread than the 2021 subdivisions, so we'll have to wait and see how that impacts the potential upsets and later-round AVs.
Adjusted Efficiency Margin Components
Since the QC is derived by the Adjusted Efficiency Margin, let's look into its components: Adjusted Offense and Adjusted Defense. I'll start with a table, and then explain it.
The table shows how many of the top AdjO teams and the top AdjD teams are in the QC (Top 50 Kenpom teams). The totals in light blue are for that respective QC. For the current Jan QC, 36 of the Top 50 AdjO are in the Jan QC and 38 of the Top 50 AdjD are in the Jan QC.
The real interesting tell of this table (and probably the only tell) is the progression from Jan QC to Final QC. In the insane years (2018 and 2021), the quantity of Top 50 AdjO and AdjD teams either fell or stayed the same. In the saner years (2017 and 2019), the quantity of Top 50 AdjO and AdjD teams increased. As for the 2022 tournament, starting the Jan QC at 36 Top-50 AdjO and 38 Top-50 AdjD teams in the QC means there is a good chance that these quantities increase instead of decrease/stay the same. The only year to produce a lower starting point was 2019 with 35 and 37, respectively. It does portend the potential for tournament sanity.
I hope this deeper dive into the Jan QC showed a more expansive and comprehensive insight. As I've said from the start of the season, I think we will have a saner tournament than 2021 (mainly due to mean reversion), but it helps to have some of the data start to back this up. If my prediction is going to come to fruition, the QC has to greatly improve first. Even though the Jan QC is showing signs of steepness, it is also shifted downwards from the more saner curves of 2017 and 2019, which leads me to believe something like 2006 could happen (three 1-seeds in the E8 and all get taken out without reaching the F4). Right now, this is my best estimate, but we still have two months and one week til Selection Sunday. Until then, thanks for reading my work and I guarantee at least one article (with hopes of two) between now and the Feb QC Analysis.
No comments:
Post a Comment