I received a lot of feedback on the meta-analysis article, and since I had some spare time this week with work being slow, I did some further digging/torturing into the data. The first attempt looked at meta through raw data (or more specifically, raw averages). It showed how the national averages of 2016-2019 are mostly similar to one another but different to other years because rule changes created different metas between all the years. What if there was a way to create cross-meta comparisons so that the data in the years from 2008 to 2015 becomes more relevant to the data of 2016-2019?
Instead of using raw data, I think cross-meta comparisons could possibly be made using relative data. Keep in mind, this is one heroic assumption being made and you know what happens when you assume. However, I want you to follow my logic, and then I'll show you the results. According to the concept of meta, a team that shoots 33.3% from 3P% in 2019 is better than a team that shoots 33.3% from 3P% in 2008 because the 3-point arc was further back in 2019 than in 2008 (two different metas). Likewise, a team that shoots 33.3% from 3P% in 2021 is better than a team that shoots 33.3% from 3P% in 2019 because the current 3-point arc is further back than 2019 (again two different metas). Under assumption, relative data should allow for cross-meta comparison because if a team is the 50th-best 3P% team in 2019, they should be "close enough to par" with the 50th-best 3P% team in 2021 to make apples-to-apples comparisons. In short, they are both 50th-best in their metas. Let's look at meta through relative terms.
Even though I want to make meta-analysis from 2008-2019, I'm sticking with the years from 2016-2019 for consistency across articles. The chart shows how many teams in each percentile are participating in the R64. For example, in the 2019 tourney, there was seventeen teams in the field that had a 2P%D in the Top 20, ten teams that had a 2P%D ranked from 21st to 40th, five teams that had a 2P%D ranked 41st to 60th, and so forth for the other percentiles and so forth for the other stat categories. As for the color patterns, the green-filled boxes represent the best year of the four years for the Top 20 percentile and the red-filled boxes represent the worst year of the four years for the Top 20 percentile. The green text represents the lowest year of the four years for the 101+ percentile (fewer teams with 101+ rankings means more teams with 1-100 rankings -- a.k.a. a defining feature of that meta) and red text represents the highest year of the four years for the 101+ percentile (more teams with 101+ rankings means fewer teams with 1-100 rankings -- a.k.a. a feature that could be an X-factor of that meta).
YEAR-BY-YEAR BREAKDOWN
Before I look at each year, I do want to state for the record that the percentiles being used are completely arbitrary (there is no scientific/mathematical reason for choosing groups of 20). It would have made more logical sense to make the percentiles Top 8, Top 16, Top 32, and Top 64. In some of the analysis below, I'll combine percentiles because the cumulative is just as defining of the meta as an individual percentile. Also, for each year's breakdown, I will have a reminder of the meta-analysis from the first article.
2019
Raw Data: "Look for teams with high 3P% and low 2P%D"
Relative Data: Twenty-seven teams in the Top 40 for 2P%D is the defining feature of this meta (other three years are 24, 21, and 17 out of Top 40). Being able to score from other areas on the floor (High 3P%, maybe even Elite FTR) is important. Lowest Top 20 TOR among four years and the most TOP 20 teams at forcing TOs seems like Elite-level turnover generation can be an X-factor.
Results: Five teams in E8 have Top 20 2P%D with the eventual NC UVA knocking on the door at the #22 rank. Two F4 possess a Top 20 3P% (UVA and MIST) with a third (AUB) knocking on the door at the #27 rank. AUB also possesses the best turnover-generation in the field with another Top 20 team in TXTC who also reached the F4.
2018
Raw Data: "Look for teams with high 2P% and low TOR"
Relative Data: 2018 has no clear meta other than tempo (and it's the most unreliable factor in predictive terms). Where this year stands out is the cumulative Top 60 and Top 100 compared to other years. From the colored texts, it has the most Top 100 3P%D teams and the most Top 100 TOR teams. As for the Top 60, it has the most 2P% teams, the most 3P% teams, and the most ORB teams. With no definable meta, look for above average performance in most of these categories with something elite as an X-Factor.
Results: Seven teams in E8 have Top 40 2P%. One team is the best-ranked ORB (DUKE) and two more teams are elite turnover-generators (TXTC and KNST). The interesting X-Factor on this list is FTRD with four teams (three of them F4 teams) with Top 22 rankings. In a meta where points are easy to come across, you don't want to give your opponents even more chances at points from the FT line.
2017
Raw Data: "Look for teams with low 3P%D or high OR%"
Relative Data: 2017 was the highest quality of the four years with green-filled boxes and green texts almost everywhere. Where 2017 comes up short is elite 2P%D, elite turnover generation, and elite defensive rebounding, so look for teams to exploit this short-coming.
Results: If it's easy to score from two-point land, then meta teams should be able to control the perimeter (I personally believe this is more due to luck than skill), especially considering the high quantity of elite 3P% teams. Five teams boast a Top 21 ranking in 3P%D. Turnovers and offensive rebounds should also generate more easy shots from two-point land, and two F4 teams feature these elite rankings for X-Factors (SCAR is 4th in TORD and UNC is 1st in ORB).
2016
Raw Data: "Look for high 2P% and high 3P% or low TOR and high ORB"
Relative Data: With the fewest elite teams at 2P%, 3P% and 3P%D and the fewest Top 100 teams at 3P% and 3P%D, any team that can boast elite-level play in 2P% and 2P%D should be title contenders. Anything that prevents easy shots should also help (elite TOR or elite FTRD). In other words, don't make it easier for your opponents to score when they can't help themselves.
Results: Two teams possess Top 21 ranks in both 2P% and 2P%D (NOVA and UNC) and both played in the title game. KU and OU possessed Top-27 ranks in one of these categories plus Top-3 ranks in another shooting category (3P%). For the preventive X-Factor stats, three teams possessed elite TOR ranks (UVA, UNC, and DAME) and two teams possessed elite FTRD (OU and DAME).
Conclusion
The relative data tends to show the same targets for meta identification as the raw data (which is a good thing for my assumption), but there are still variations between the two. I may try to post the 2008-2015 relative data later in the week, but I don't want to get too caught up in an un-tested model, especially one in its first year of a new meta. I most likely will use any spare time next week to get ready for BCW, but meta-analysis is definitely scratching an itch. As always, thanks for reading my work, and I'm ready for Selection Sunday in eight days and the tournament on the following Friday.
Fascinating work. I must say, its totally clear to me that you definitely have off-the-charts patience and focus to dig into the data to such an extreme. I can only dream of cultivating a work ethic even close to 50% of yours. Perhaps your youth (at least compared to this senior citizen) is a big asset. Thanks for sharing your knowledge!
ReplyDeleteI'm honored to have that said about me. I've been compiling data for years though, and I've followed in the footsteps of smarter people. But what I imagine and what I produce are worlds apart from each other, so I always feel like I'm not doing enough. Picking a perfect bracket is my way of being one-of-a-kind and that's what motivates me.
Delete