Mar 14, 2017

2017 Quality Curve Analyis - Final Edition

The 2017 NCAA Tournament bracket has been revealed and we know the 68 teams who will be playing. The best part of all: The narrowing of the basketball landscape to these 68 teams means we can finally take a look at the elusive Seed Curve (SC). This article is going to be pretty straight-forward, so here is the run-down if you are interested:
  1. Catching up to speed from the March Edition to the Final Edition
  2. Comparison of the 2017 QC to the 2017 SC
  3. Breakdown of the 2017 SC and comparisons to previous years

CHANGES SINCE THE MARCH EDITION

Since a picture is worth a 1,000 words, let's start with one of those.

For a brief summary, we see:
  • The top three teams slightly improve, 
  • Moderate degradation from the fourth to ninth teams 
  • Moderate improvement from 12-15
  • Incremental degradation from 16-19
  • Significant strengthening from 20-27
  • Incremental changes from 28-50.
Given what happened over the last 12 days of college basketball, I would say the chart accurately reflects it. A lot of games featured two teams in the Top 50 KenPom ranking squaring off against each, and someone has to lose those games. In the six power conference tournaments, only two top-seeds won their tournament (NOVA & UK) and only one more reached the championship game (ORE). One was eliminated in its conference tournament semi-finals (keep in mind, every eventual NCAA champion since 2001 has at least reached their conference tournament semi-finals) and two were eliminated in their very first game. The regular-season place of the power conference tournament winners were 1,1,2,4,5,8.

The really strange feature of the Final QC is how much it looks like a roller coaster compared to a downhill. It goes vertical from 1-3, begins leveling out from 3-5, minimal decline from 5-24, then vertical again from 24-28, then flat from 28-50. I've said from the beginning of the season that I think the quality of the teams are tiered: There is strength at the top and parity among that strength. However, I was only considering 14-20 teams at most to be in the top-tier group. From the QC, it looks like there could be as many as 24-26 teams in the top tiered group (depending on where you want to draw your cut-off line on the chart where it begins going vertical, but 28-50 is definitely an inferior tier group to the top tier group). The QC does NOT have the look of 2008 or 2015 where there was dominant strength at the very top, so I wouldn't rush to put three or four 1-seeds into the Final Four. Even though the top three teams in the KenPom rankings received 1-seeds, two or less of these teams in the Final Four is very safe this year, especially considering exactly one 1-seed has made it to the Final Four in 5 of the last 7 years.

2017 QC versus 2017 SC

The next stop on our journey is seeing the overlay of the Top 50 efficient teams compared to the Selection Committee's appraisal of these team's quality. One quick message on methodology, every four ranks in the KenPom rankings is a seed group (1-4 is 1-seed, 5-8 is 2-seed, etc), until you get to 41-46, which I grouped 6-teams as 11-seeds because the tournament bracket has six 11-seeds. Then, 47-50 are the 12-seed group. With that said, let's take a look.

For starters, there are some noticeable gaps between the two lines, and this usually tells us where to look for possible upset-games and potential sleeper teams. The major spots of inferiority are located at the 3-seeds and the 6-seeds, both of which are 2 points below in adjusted efficiency margin (AEM) points than their quality curve counterparts.
  • For the 3-seeds, they are ranked 13th, 16th, 18th, and 19th in the KenPom ratings, which suggests they are 4-5 seeds. As a group, they may be over-seeded and prone to upsets.
  • For the 6-seeds, they are ranked 11th (3-seed), 22nd (6-seed), 27th (7-seed), and 45th (11-seed), meaning the group is possibly being dragged down by an outlier.
The next two spots of weakness are located in the 2-seeds and the 12-seeds, which are 1.195 and 1.333 AEM points below the quality curve.
  • For the 2-seeds, they are ranked 4th (1-seed), 6th (2-seed), 12th (3-seed), and 20th (5-seed), which looks like another group being dragged down by an outlier.
  • For the 12-seeds, they are ranked 48th, 55th, 59th, and 60th, which suggest seeds in the range of 12-14. Though they are slightly over-seeded, this is most likely on par with typical years. These teams feature the Conference USA and Mountain West regular season and tournament winners (given the status of those conference, they should always be on the highest line of the auto-bid seeds 12-16) and the Ivy League regular season and tournament winner (which has had considerable amounts of success in the NCAA tournament, just see Yale, Harvard and Cornell in the last 7 NCAA tournaments).
The final two spots of weakness are located in the 5-seeds and 9-seeds, which are 1.02 and 1.06 AEM points below the quality curve.
  • For the 5-seeds, they are ranked 7th (2-seed), 17th (5-seed), 25th (7th-seed), and 33rd (9th seed). With one under-seed, one accurate-seed, and two over-seeds, the bracket picks made on 5-seeds this year could make the difference in winning your bracket pool by one or two points. Even though the highest efficiency-ranked 5-seed UVA is playing against the worst efficiency-ranked 12-seed UNCW, picking the right 5-12 winners and if any advance to the next round could change your bracket points total by 3-9 points (assuming 1 point for R64 and 2 points for R32).
  • For the 9-seeds, they are ranked 34th (9-seed), 43rd (11-seed), 44th (11-seed), and 53rd (13-seed) in the efficiency rankings. As a whole, this group looks over-seeded, and when a group is over-seeded, their counterparts (assuming they are accurately or even under-seeded) usually win the majority of those match-ups.
Now let's take a look at the pockets of strength in the bracket. We are looking for any where on the QC vs SC chart where the SC stands head and shoulders above the QC at a particular seed group. Do you see any that fit this description? I'm not so sure that I see anything remotely resembling that description. JUST KIDDING. Holy shit, look at the 10-seeds. Let's start there.
  • Just for fun, if you add the gaps between the QC and the SC at the 2-, 12-, 5- and 9-seeds (listed above and they are rounded), you get a difference of -4.615 AEM points, meaning together those seeds are 4.615 points below the QC. The 10-seeds, on the other hand, are a +4.585 AEM points above their QC counterparts. In one group, the absolute value of the difference is approximately the same as four of the weaker groups combined. I wonder if some of the teams in the 10-seed group should have been in the 2-, 5- or 9-seed groups. Let's see.
  • For the 10-seeds, they are ranked 8th (2-seed), 24th (6-seed), 28th (7-seed), and 52nd (13-seed). As a whole, they are drastically under-seeded, and would have been even greater if not dragged down by a 13-seed disguised as a 10-seed. Imagine being a 7-seed and having to play against a 2-seed, 6-seed or 7-seed in your first round even though that opponent has a 10-seed beside their name. I guess the only thing that could make me feel better about being a 7-seed paired against a better team disguised as a 10-seed is if the game was closer to home.
Let's look at some other teams that are more efficient than their seed would suggest, namely 4-seeds, 7-seeds, and 8-seeds.
  • Since I just did their counterparts the 10-seeds, let's start with the 7-seeds, who were .200 AEM points above the QC. The 7-seeds have efficiency rankings of 14th (4-seed), 21st (6-seed), 31st (8-seed) and 36th (9-seed). It looks as if we have a spread-out mixture. We have a drastic under-seed, a slight under-seed, a slight over-seed, and a moderate over-seeded. In other words, the SC may not be reliable when telling use the true quality of the 7-seeds. Looking at the individual 7-10 match-ups according to over-seeding and under-seeding, they go as follow (rank among 7s vs rank among 10s): 1v4, 2v2, 3v3, 4v1. If you wanted to look at this a different way (efficiency-seed of 7s vs efficiency-seed of 10s): 4vs13 (14th vs 52nd), 6vs6 (21st vs 24th), 8vs7 (31st vs 28th), and 9vs2 (36th vs 8th). By these comparisons, two games look like coin flips and two games look like chalk picks.
  • Moving on to the 4-seeds (who were 0.155 AEM points above their QC), these four teams are ranked 5th (2-seed), 9th (3-seed), 15th (4-seed), and 26th (7-seed). Without the one outlier dragging down the averages, this would be a much stronger group. It is also a possible explanation as to why the 2- and 3-seed SC groups were below their QC group, because each one of their groups were seeded down in the NCAA tournament. Considering the strength of this group, I would not be surprised if a 1-seed went down in the S16 considering how much parity is in the top-tier group of teams (and if none go down this round, maybe the match-up will take a toll on the 1-seed in the E8 game).
  • Finally, let's investigate the 8-seeds, who happen to be 0.385 AEM points above their QC. This seed groups consists of the 23rd (6-seed), 32nd (8-seed), 38th (10-seed), and 39th (10-seed) efficiency-ranked teams. This is kind of surprising. The 8-seeds might actually be an over-seeded group being pulled up by one outlier (the opposite of some of our seed groups who were dragged down by an outlier). In our QC analysis, we discovered that the top-tier group might extend to the 24-26th ranked teams, and the 23rd ranked team in the 8-seed group would fall into this tier-group. After the drop-off around the 24-28 seeds, the QC flat-lines all the way to 50. It very well could be the case that the 8-seed group is being pulled up by a strong AEM value and the rest are average to below-average.
  • Since I am a nice guy, I will look at the 11-seeds as an added bonus. This is rather difficult because this seed group has 6 teams instead of 4, and this does mess with the averages. The 11-seeds have the following efficiency rankings: 29th (8-seed), 30th (8-seed), 37th (10-seed), 40th (10-seed), 56th (14-seed), and 61st (15-seed). For starters, we know where all of our actual 10-seeds are located: two are posing as 8-seeds and two are posing as 11-seeds). Second of all, the 29th and 30th teams play against each other in one play-in game and the 56th and 61st play each other in the other play-in game. Even if I were to look at the SC Thursday morning with two of those removed, I doubt it will affect the seed group's SC/QC differential too much. Third of all, since we know the 6-seeds are a decent group being dragged down by one outlier, I wouldn't want to be the outlier playing against any of the under-seeded 11-seeds.
2017 SC Analysis and Comparisons

The chart above shows the 2017 SC with a linear-regression line-of-best-fit imposed. The steepness of the regression line is important because it tells the overall quality (or lack thereof) in the tournament. It appears to be calling for a pretty average year, so what is an average year? This depends on how you want to frame it. I look at four eras of college basketball because these are the four eras that have either impacted the game itself or impacted the perception of the game. These four eras are described below with a table of their results (the average number of upsets and the standard deviation from the average during those time frames.
  • The KenPom era (light green): When Ken Pomeroy began analysis of the tournament using possession-based efficiency ratings (2002-present).
  • The OAD era (light purple): 2007-present when the one-and-done (OAD) rule applied.
  • The 20'9 era (sand): (2009-present) The era when the 3-point arc was 20'9 feet instead of 19'9.
  • The FOM era (gray or cancer-colored, your choice): The era (2014-present) in which the NCAA thought it would be cool to use the rulebook to artificially inflate scoring in the game.
First of all, do not forget that a game is only counted as an upset if it the lower-seed wins and the seed differential between the teams is 4 or more. So, if a 5 beats a 1, it is an upset, but a 4 beating a 1 is not. Secondly, the overlaps can tell you what each era brought to the quality of the game. I have documented a lot of these changes in previous articles, but you can probably see the effect with your own eyes and mathematical ability. Returning to our question, average means about 9-11 upsets, with 5-6 coming in the R64, 2-3 in the R32, and one more coming in the either the S16 or E8. Remember, average is being based on the regression line, not the SC itself. Before we move onto year-to-year comparisons, I want to scale the previous chart (from 16 seeds to 12 seeds like the QC/SC analysis had) to give a better look at why I am thinking average year.

So the big question, what years look like 2017. I've already ruled out 2008 and 2015, when strength at the very top was present and unquestionable. This year's very top is not that strong. If I had to pick three and make a composite, here are the three that I would choose: 2005, 2006, and 2014 (not exactly the three you would expect to be grouped together). First, let's see what they look like.



The reason I picked all three of these years is the bowing action at the front of the curve. All three have what looks to be two upward bows at the front half of the curves:
  • 2005 bows from 1-5 then 5-10.
  • 2006 bows from 1-4 then 4-8.
  • 2014 bows from 1-4 then 4-7.
  • 2017 bows from 1-4 then 4-7.
In terms of parallels, 2017 follows 2014's bowing action. The bowing action should give you a good idea on what to expect in tournament performance from the seeds in these ranges. However, the major difference between 2014 and 2017 is the action of their curves after the 7-range. In 2014, the curve tends to linger in the higher levels from 7-11 whereas 2017 begins a drops steadily throughout except for a giant spike at the 10-line. If you visualize the 2017 SC without the spike at the 10-seed, its nothing but a steady falling line. They do have one thing in common: Both years featured a drastically under-seeded team. In 2014, UK was an 8-seed even though efficiency rankings suggested they should be a 5-seed. After the tourney was over, UK had improved to the level of a 4-seed efficiency ranking. 2017 has 10-seed WICH, which has an efficiency ranking suggesting it should be a 2-seed.

The second closest parallel may be 2006, which has a slightly wider bow out to the 5- and the 8-seed, respectively. For the most part, 2006 did not come unglued until the E8 when all three of the remaining one-seeds lost their game that would have sent them to the F4. In all four E8 match-ups of 1v2, 1v3, 1v11, and 2v4, the lower seed won all four games. The issue I have with 2006 is the same issue I have with 2014: the right half of the curve lingers in the higher levels whereas 2017 does not have this feature (again, except for the spike at the 10-seed).

2005 has the widest of bowing actions, extending out to the 5- and the 10-seed for each bow, respectively. It also has a spike following the 2nd bow at the 12-seed, though not as pronounced as the 10-seed spike in 2017. It is worth noting that 2005 did see a 12-seed in the Sweet 16, so maybe the same will hold for 2017's 10-seed.

Let's look at the AMs (R32 - S16 - E8 - F4 - R2 - NC) of each of these years:
  • 2005: 182 - 72 - 29 - 11 - 2 - 1
  • 2006: 195 - 71 - 25 - 20 - 5 - 3
  • 2014: 190 - 79 - 36 - 18 - 15 - 7
  • AVG: 189 - 74 - 30 - 16.3 - 7.3 - 3.6
MY THOUGHTS

While I wouldn't build my bracket on the AVG AM for the F4, R2, and NC rounds, I do love the AVG AM for the R32, S16, and E8. 189 - 74 -30 look like healthy targets for those three rounds. Another thing all three of these years have in common: All three dropped a 1-seed and two 2-seeds before the S16 (Editor's Note: This statement is the result of a typo. It should be "before the E8" not "before the S16"). As for my upset totals, I think 4-3-2-0-0-0 looks pretty reasonable. If you are ballsy, either another upset in the R32 or one in the E8, but I probably would not do the latter unless a 100% certain opportunity presented itself. This concludes the Final Edition of the 2017 Quality Curve analysis. I hope you enjoyed reading it, and I hope my predictions are spot on (because if they aren't, I'll probably turn off the comments, lol/jk).

9 comments:

  1. Looks great!

    I'm currently in the process of creating a spreadsheet, similar to the past. My end goal is not only to make it compatible for this year's tournament, but also all future tournaments (assuming data from external Excel sources do not change) as well.

    Any idea of which site may contain either a bracket in table form or just a table of seed, team, and region? I essentially only need that info and then figure out the VBA coding for retrieving the coaching data and I'm fairly confident that in the future all I will have to do is open the Excel file, click go, and all of the data will appear in a matter of seconds.

    ReplyDelete
    Replies
    1. I've always used the team's official athletic site for the data and manually keyed it in. I haven't found a site that has a team's & opponent's data all in one location, much less the coaching data as well.

      I linked a Stat Sheet template in the Resources tab above. I don't know if that's what you are looking for in terms of seed, team and region layout. I also created a bracket in layout in Spreadsheet (used it for the images in the Bracket Modeling Tab). I could possibly template that (after Thurs though) if it's what you are looking for.

      Delete
    2. Yeah, I'm trying to get beyond entering any data in at all manually and I feel like I'm pretty close.

      As far as the statistics, College Basketball Reference has almost every stat I need in table form. When starting my macro in Excel, my spreadsheet goes to College Basketball Reference and pulls all of the actual statistics (wins, losses, pts for, pts against, etc.) and then brings it back to my spreadsheet in a table. I then use a VLookup function to move all of the data I'm interested in, into a tab (similar view to like what Pete had in his).

      As long as College Basketball Reference does not change the layout of their site, that portion of my spreadsheet will be able to update in future years automatically.

      If I can just find a site that will have a consistent layout of all of the teams in the bracket, their region, and their seed, I'm confident I can manipulate it, so that the spreadsheet updates 100% automatically in future years after starting the macro.

      Delete
    3. The other thing you will have to look out for is things like name changes and new Div I programs. Since CBR typically lists their data in alphabetical order, calling row 141 in 2017 will get you Kansas, but in 2018, if they change the name of (or sorting of) Kansas St, you could retrieve their data thinking it was Kansas. Same with a new Div I program with a name A-J would bump Kansas down a number. Stuff like that is pain.

      Delete
    4. Wouldn't the VLookup resolve the issue of the team data being in a different row than my original file?

      So how I currently have it, I name the entire sheet of data that I pull from CBB reference as "Team_Basic". By naming the entire sheet as a set of data, that will still capture any data that is added to the next row below my last row this year.

      My Vlookup function is set as =Vlookup([Cell],Team_Basic,[column of data I'm looking at], False).

      The order of the data pulled from CBB reference should not matter since I'm using the VLookup to find the team name anywhere in the first column of data that I pull from CBB reference.

      Delete
    5. Sorry, I mis-read your previous comment. Where you said VLookup, in my mind, I read VBasic (Visual Basic) cause you had mentioned it in a previous post. As for VLookup, I have no clue. I have never used that function. I've never used macros or anything too techncial/programmatic. The only programming experience I have is Java, and I'm not too proud of that, lol.

      Delete
    6. No worries, haha. I've been picking up VB/Macro stuff here and there over the past few months ever since I transitioned into a data analyst position at my work.

      Vlookup might be the greatest function in Excel. I use it everyday now for my job and it has helped me tremendously with fantasy sport stuff as well.

      Delete