The Evolution of the 2017 Quality Curve
The first place that we must begin is an analysis of how the QC has progressed over the three time points of analysis, from the January analysis to the February analysis to the current one. I say a picture is worth a thousand words, so let's see what we have.
JAN to FEB: Let's start with the elephant in the room: the progression from January to February. As the two lines for January and February clearly show, every single team in the QC improved from Jan to Feb. In fact, all but five teams in the Top 30 of the QC (remember the QC is the Top 50 ranked teams in the KenPom Ratings) improved their rating by at least 1 point.
FEB to MAR: From the February QC to the March QC, this is where it gets interesting. For starters, only 24 out of 50 teams (less than half) improved from February to March. The majority of these declines are consolidated into two ranges of the QC: the 1-6 ranked teams and the 25-45 ranked teams. To put that into bracket language, those are mostly your 1-seeds and your 7- to 11-seeds (if the tournament was seeded according to efficiency ratings, which it is not).
2017 vs 2016: If you remember back to the 2016 March Edition of the QC Analysis, I addressed the same evolution of the 2016 QC (under the section 2016: Then and Now). The 2016 curve went through two shifts:
- From JAN to FEB, the 2016 curve rotated along a pivot as top teams began playing against other quality teams on a night-in, night-out basis.
- From FEB to MAR, the 2016 curve descended among every major seed group except 1s, 2s, and 8s. Since our Final 4 ended up being a 1-seed and two 2-seeds, maybe that was a prophetic movement.
JAN to MAR: If the evolution of the 2017 QC is not a result of the new calculation method (meaning it is due to the difference in team quality between the two years), then I do believe it could be prophetic on how the tournament plays out (like the 2016 evolution predicted the strength in the 1s and 2s).
- Comparing the January QC to the March QC, only five ranks are worse now than they were in January: 3, 4, 30, 31, and 32. Keep in mind, this does not imply anything about the teams currently ranked in those exact spots. It simply means the teams ranked in those spots in January were playing slightly more efficient than their counterparts in March. Considering the January QC shows efficiency versus the non-conference schedule and the March QC shows efficiency versus the non-conference and conference schedule, it signifies relative strength (compared to the 2016 teams) when you can play against better competition and your efficiency ratings fall only slightly (compared to the drastic falls in 2016).
- The other important implication this evolution could be foretelling (assuming the evolution is due to team quality and not calculation method) is the prediction I have been making all year long: there is strength at the top. If you have read the articles January QC Analysis and Warming Up The Crystal Ball, you will already know what I mean by "strength at the top." I do not think that teams at the very top are more dominant than teams with lower efficiency ratings. In other words, I don't see this year resembling a 2008 or a 2015 when at least three 1-seeds made the Final Four. Instead, the teams ranked 10th and 20th in the 2017 KenPom ratings are significantly higher than all other years. In fact, the tourney years that come closest to matching these two 2017 ranks are 2013 (which comes closest to matching the 10th ranked team) and 2014 (which comes closest to matching the 20th ranked team). If this is foretelling, those two years had Final Fours of 1,4,4,9 and 1,2,7,8. I am aware that choosing the 10th and 20th ranked teams is arbitrary, but seeing how strong those ranks are this year compared to all other years says to me that we could see 4- and 5- seeds toppling 1-seeds (this claim assumes teams are seeded according to efficiency rankings rather than records, which they never are seeded this way).
2017 QC versus the World
As I was searching for the QCs of previous years to find a parallel to the 2017 QC, I discovered something that I think may be quite relevant for the 2017 tournament. The first table shows the differentials between 2017 and the corresponding year. Differentials in black mean 2017 was better than its counterpart in that particular year and differentials in red mean 2017 was worse than its counterpart in that particular year.
In a number of years, I was noticing that 2017 had better ratings at every 10th interval (not just the 10th and 20th intervals) and that the difference between that particular year and 2017 was roughly the same. This table shows this observation. Looking at the years 2003-2006 from the 1st rank to the 50th rank, the differentials follow the same pattern, with variations in the magnitude of those differentials due to team quality in those particular years. In other words, if you took any of those four QCs and shifted them upwards by a few points, you would inevitably approximate the 2017 QC. I believe there is an valid explanation for this, and I will get to it following one more observation. If you exclude the 2006 season (and I have a legitimate reason for doing this), there is another noticeable pattern for teams ranked 5-25. Starting from 2002 (and excluding 2006), the differentials tend to reduce for each subsequent year up to 2008, then for the next three years (2009-2011), they rise and stay elevated, before they eventually start reducing again year over year. The best explanation for these two observations is the three-point shot. Efficiency ratings give a 1.5 modifier to three-point field goal percentage since 3-point shots are worth 1.5 times as much as 2-point shots. If teams are taking and making more three-point shots in 2017 than they were in 2003-2006 (and they are), I would expect the 2017 QC to be higher at every rank than its 2003-2006, and the table clearly shows it. Likewise, the 3-point arc was moved back exactly 1-foot for the 2008-2009 season and has remained there since. It most likely took a few years to get used to this extra distance (both players that were currently in college and those going to college from high school). Now that the 20-foot, 9-inch standard has been in place for its ninth year, teams and players know what they need to do to make 3-pointers.
- The reason I exclude 2006 is for the one-and-done rule. It was put in place before the start of the 2005-2006 season to affect players starting college in the 2006-2007 season. Most players in the 2005 graduating class knew the choice was to go professional or go to college, and many chose not to risk college. It explains why 2006 was so upset-heavy (not a very talented year in college hoops) and why 2007 was so stable (the young talent had to play in college for a year). If you look at the differentials from 30-150 from 2007-2017, the differentials seem to form a box of red. The majority of squares in that box are red, and you don't see that pattern anywhere else in the table. It is a clear sign of the effect of the one-and-done rule on college basketball.
The first thing that needs to be pointed out is 2017 column is the only column with pre-tourney ratings. All other years are the ratings at the end of their respective season (which includes the impact of NCAA tournament performance on the efficiency ratings). Let's look at each factor.
- The CorrelT50 stat is the 1-for-1 correlation coefficient for each and every team in the KenPom Top50 for the respective year compared to its counterpart in 2017. It calculates the joint variability between the 1st ranked team in 2017 and the 1st ranked team in the respective year (and likewise for all 50 teams) and produces a rating on a scale from -1.0 to +1.0.
- For the most part, the correlation coefficient should be higher than .9000, unless you are dealing with two completely opposite-type years (say 2007 versus 2014). If the CorrelT50 value is closer to +1.0, then the QCs of the respective year and 2017 are closely correlated.
- The five years with the highest CorrelT50 values with the 2017 QC, in order highest to lowest, are 2004, 2014, 2009, 2005, and 2003.
- Oddly enough, two of the years -- 2008 and 2015 -- I claimed that did not match 2017 fall into the bottom half of the CorrelT50 values.
- The Correl50 stat is the correlation coefficient for the intervals in the table 1 to 50. It shows similar results to the full Top 50 Correlation Coefficient, although 2016 makes a surprising strong appearance in this stat.
- The Stdev stat takes the standard deviation of the 1-for-1 differentials for all Top-50 ranked KenPom teams for the respective year to 2017. It calculates the difference between the 1st ranked team in 2017 and the 1st ranked team in the respective year (and likewise for all 50 teams), then calculates the standard deviation of all 50 differentials.
- In theory, a smaller standard deviation says the majority of the differentials are close to zero and the two QCs are approximately similar to one another.
- The five years with the lowest Stdev values, in order lowest to highest, are 2004, 2005, 2014, 2009, and 2003.
- Again, 2015 grades out with the largest Stdev value compared to 2017, so it is safe to say that 2015 should not be the model used for the 2017 bracket.
Final Thoughts
What does this mean for the 2017 bracket? I won't claim anything as a certainty until I see the bracket revealed on Selection Sunday and get a good look at the Seed Curve on the following Monday. However, it does look like fewer upsets than normal in the Round of 64 (definitely less than the 8 in 2016), but the number of upsets for the whole tournament may approach double-digits as 4s, 5s, 6s, and maybe 7s topple 1s, 2s and 3s in R32 and S16. Based on the similarity of 2017 to the five tournaments listed above, it would be very shocking to see more than one 1-seed in the Final 4 (yes, I am aware that 2005 and 2009 featured two 1-seeds each). If you look at the Mad-O-Meter ratings for those 5 years, it ranges from 9% in 2009 to 21% in 2014, with the other three years checking in at 11%, 11% and 15%. With this sweeping range of possible Mad-o-Meter ratings, it does seem like it all comes down to how the Selection Committee ranks each and every team.
I hope this has been an informative read for you. In my honest opinion, reading the January Edition, followed by the February Edition, followed by this Edition should give anyone -- even someone who hasn't watched a game all year -- a good idea on what to expect for the 2017 tournament. As I said before, I wouldn't take any prediction as conclusive until the Final Edition of the Quality Curve Analysis comes out on the Monday following Selection Sunday. Until then, thank you for reading and I hope to see you again next Wednesday.
neat!!
ReplyDeletePhenomenal work - thank you for doing this.
ReplyDeleteYou guys are welcome, and thank you very much for reading.
ReplyDeleteGreat stuff! I've been looking for a Bracket Science alternative since Peter took a break from his work and this is fantastic! I don't know if this is asking too much, but would it be possible to do the same analysis with the pre-tourney ratings each season rather than post-tourney? (I know KenPom offers Excel data sheets for both) I feel like the way it is right now, using pre-tourney for 2017 but comparing to KP efficiency ratings post-tourney for every other years, skews the data and correlations somewhat.
ReplyDeleteIt would be possible if I had the pre-tourney. Unfortunately, KenPom changed his methodology this season. I have some of the pre-tourney data with the old methodology (Pythag win%). I do not have any of the pre-tourney data using the new methodology (Adj Eff Margin), or else I would be using it. If I could find a way to make translators, that could be a possibility. If you want to understand what changed, read this link: http://kenpom.com/blog/ratings-methodology-update/ As always, thank you for reading.
ReplyDeleteAh okay, that totally makes sense. Didn't realize the methodology had changed this year. Thank you for the link!
DeleteThanks much M L ... i've been dealing with numbers my whole life, and this stuff fascinates me.. keep up the good work!
ReplyDeleteThank you very much for reading. I've always believed numbers never lie, which is why I love doing this stuff.
DeleteJust for clarification, the data that is on the KenPom site for previous years is actually post tourney data? Crap, I have been analyzing year to year comparisons as if it was pre tourney data.
ReplyDeleteYou are correct. The yearly rankings on KenPom.com are post-tourney rankings. They include the results of the games played in the tournament. For some teams, the results are remarkably different (2016 NOVA started at 5th, finished at 1st, and their individual ratings for O & D also changed). For other teams, the individual O & D ratings may be the same (or close to the pre-tourney value), but the relative rankings are different depending on how all other teams moved up and down the rankings.
DeleteAny time I make year-to-year comparisons, I point out this discrepancy. If you ever see in my writings that I fail to do this, I beg you to scold me. The last thing I would ever do (or even want to do) is mislead anyone.
Alright, so I purchased the premium content from KenPom (pretty good deal, only $19.95 for annual subscription) and they have the post/pre tourney data for every year. If looks like the efficiency margin is not "adjusted" though in years past, probably due to you the reasons you stated about the changes in efficiency formula.
DeleteThat is very strange that the efficiency margin isn't updated for previous years. I do not have the subscription, and I did not know that data was included. All of the data I have has been copy-pasted to spreadsheet, and I am hoping to be able to make a translation from the old data to the new data, but I'm starting to think that is not practically possible.
DeleteThe efficiency data is updated for year's past (pre and post tourney) with the difference being it just says "EM" opposed to "AdjEM" for year's prior to this year. Not sure if there is a difference between the two or what.
DeleteYeah, this my first year subscribing to the premium content at KenPom. I've barely scratched the surface in terms of all of the provided stats, but it looks like nearly everything you need for the spreadsheet you've prepared in the past can be found on that site. I highly recommend checking it out.