Feb 15, 2017

The New Standard - Changes to the Selection Process

It's good to be back here on another Wednesday, writing another article and preparing to pick a perfect bracket this year. If you have followed recent announcements in the college basketball landscape, you will know that the NCAA scheduled a meeting on Jan 20 concerning the use of advanced metrics in the Selection Process. Though the NCAA specifically stated that this meeting was exploratory and such changes wouldn't be implemented for the 2017 tournament, this change is definitely worth exploring. For this article, I will record the known details of the meeting, examine the old rating system (the RPI rankings), and investigate the new meta of bracket prediction (when the selection committee knows/values what we have known/valued for the last decade).

"Mr. Secretary, I propose an official reading of the Minutes of the last meeting"

If you've ever been to a business meeting, my advice is to not attend one sleepy. In my opinion, the reading of the minutes of the previous session is the absolute worst part. You are essentially going over the same stuff you went into full-detail in the last meeting, but this time, its purpose is purely for record-keeping. Here is the minutes of the meeting, but I'll try to do it in a way that won't put my readers to sleep.

Who attended the meeting?
  • Dan Gavitt - NCAA Senior Vice President for Basketball: He ran the meeting
  • David Worlock - NCAA Director of Media Coordination and Statistics: He was the statistics expert from the NCAA's side of the table.
  • Jim Schaus - Ohio University Athletic Director: Member of the 2016-17 NCAA Selection Committee - He represented the Selection Committee.
  • Ken Pomeroy - Advanced Metrics Statistician for College Basketball (Link): KenPom Ratings uses a predictive approach.
  • Jeff Sagarin - Advanced Metrics Statistician for College Basketball (Link): Sagarin Ratings uses a predictive approach.
  • Ben Alamar - Advanced Metrics Statistician for College Basketball (Link): BPI Ratings uses a predictive approach.
  • Kevin Pauga - Advanced Metrics Statistician for College Basketball (Link): KPI Ratings uses a results-based approach.
  • Others attended, but I believe these are the most relevant.

Why have the meeting?
  • The primary reason for this meeting occurring is a direct request by the National Association of Basketball Coaches for more advanced metrics in the selection and seeding process.*
  • The are plenty of secondary reasons (all of which were factors in the primary reason).
    • A replacement/alternative for the RPI, which many in the sport consider out-dated.
    • A crutch for the Selection Committee - something to which they can point in order to justify their selection or seeding decision (something other than the phrase "We looked at the whole body of work and thought they deserved it).
    • Modernizing college basketball - for better or for worse
What was discussed?
  • How to produce a metric/tool that would allow for the comparison of teams when selecting and/or seeding the tournament.
  • How to make sense of the advanced metrics currently in existence.
"Are these hieroglyphics on this stone tablet? No sir, those are called RPI rankings."

Insults and jokes aside, the RPI rankings have been a staple of the selection committee for over three decades. It has been used as the default resume evaluation tool when selecting and seeding teams in the NCAA. The only place you will find any reference to the RPI on my blog is the PPB Watchlist, which serves as a guide to readers about which teams are likely to make the NCAA tournament. I guess if the NCAA is pushing the RPI metric to the side, I will have to do the same. More importantly, you will not find anything related to the RPI on my blog when it comes to bracket prediction (and for good reason). For those who may not know, let's start with how it is calculated.

RPI = (0.25 * Win%) + (0.5 * Opp Win%) + (0.25 * Opp's Opp Win%)

Let me explain the variables. Win% is the percentage of your games played that you won. Opp Win% is the average of the win percentages of each of your opponents. Opp's Opp Win% is the average of all of the win percentages of the teams all of your opponents played. These same variables are also used to produce the Strength of Schedule (SOS) component of the index.

SOS = (0.667 * Opp Win %) + (0.333 * Opp's Opp Win%)

Since its creation in 1981, the RPI was this vanilla. It wasn't until the 2004-2005 season when the NCAA unanimously approved a change to differentiate away games from home games. Road wins now count as 1.4 Wins, Home Wins count as 0.6 Wins, Road Losses count as 0.6 Losses, and Home Losses count as 1.4 Losses. Neutral site wins and losses are still counted as 1.**

Yet, even after these well-intentioned changes, the RPI still has its flaws.
  • The most glaring weakness is the following built-in assumption: a win is a win. The argument against this line of thought is that some wins should count for more than other wins.
    • As it is calculated, a road win in a smaller conference is equal to a road win in a power conference. For example, if IND wins a road game at INFW, then it counts just the same in the RPI as a road win at MIST.
    • Likewise, there is a case to be made about wins in Feb and Mar versus wins in Nov and Dec. Year after year, it seems as if there are two power-conference teams that follow this pattern: One goes 12-0 in the non-conference and then skates to a 9-9 record in conference while another team goes 8-4 in the non-conference and then surprises to a 12-5 record in conference. I personally would give more credit to the team winning later than earlier, but the RPI doesn't always work this way since a win is equal to a win. If you want a clear example of how this gets missed by RPI, see MIST and MINN in this year's RPI (and keep in mind that MIST has the H-A sweep against MINN).
  • Another issue with the RPI is the use of arbitrary weightings. How did the NCAA come up with weightings of 0.25 for Win%, 0.50 for Opp Win%, and 0.25 for Opp's Opp Win% in the RPI formula? For that matter, how did they arrive at 1.4 and 0.6 for the weightings of road and home game outcomes? When I see even numbers (or un-rounded numbers) as weightings, the data scientist in me says these values were not back-tested. For example, in the year-end Stat Sheet, the value for Possessions is calculated using a weighting of 0.475 (rounded of course) on FTAs. Ken Pomeroy uses a similar value because he's back-tested the number of times a free-throw attempt (whether two shots or one-and-one) results in a possession. If the weighting was 0.5, I'd be very skeptical of the resulting calculation (especially since this would add anywhere from 8-14 extra possessions to the total and distort efficiency numbers in the process). If you are curious as to how the RPI ratings can be distorted by using arbitrary weightings, here is a really good article discussing it.
  • The final issue, and there's a reason why I saved it for last, with RPI calculations is the issue of results-based methodologies versus predictive-based methodologies. The RPI is a results-based method, meaning it only looks at wins and losses. As the calculation shows, it is rather binary in nature: If you win, you get a 1 and if you lose, you get a 0 (or 1.4/0.6 depending on location). It doesn't take scoring margin into account. Why does this matter? Scoring margin is a really good indicator of team quality (which is what I emphasize a ton on my blog). For example, PITT hosted LOU on Jan 24 and VT on Feb 14. PITT lost both of those, but it lost to LOU by 55 and to VT by 3. I would be very hard-pressed to make a case that those two outcomes should be equally valued at 1.4 for a road win. RPI does this, which is essentially valuing LOU equal to VT, whereas predictive ratings do not. Ironically, VT goes to LOU on Feb 18, so if RPI were used to make a vegas betting spread, it would probably produce a PICK (which is a toss-up) or -1 LOU (LOU wins by 1), but a predictive-based methodology would look to something above -10 LOU (LOU wins by 10 or more).
What does the RPI get right?
  • In statistics, there is a quality of measurement known as statistical reliability. It refers to the consistency of a measure, where similar results are being produced under similar circumstances. Consistency is a very good thing when it comes to data analysis. If you compare different years of the RPI to one another, the results are very comparable. Teams that have more wins against better-RPI teams usually rank toward the top of the list, whereas teams that have more losses against worse-RPI teams usually rank toward the bottom of the list. For example, a team that is 2-1 vs Top 25 RPI and 10-2 vs Top 50 RPI will usually be ranked toward the top of the rankings, and this result is very evident in the RPI rankings across the many years it has been employed.
  • In statistics, there is also a quality of measurement known as statistical validity. It refers to the accuracy of a measure, where the measure is actually quantifying what it is suppose to quantify. Unfortunately, the RPI does not have statistical validity because it is being used to measure team quality, and from its flaws described above, we can see that it does not "validly" measure team quality. However, it is measuring something and it is measuring it consistently.
    • If I had an educated guess, I would say RPI is measuring schedule quality rather than team quality. I've always noticed that a team's RPI rank is (more often than not) closely approximate to its SOS rank. In the RPI rankings, I've seen teams that have had RPI rank equal to their SOS rank, teams that have had the two separated by almost 150 values, and teams that have had inverted values where the SOS rank is much higher than the RPI rank. Each of these tells me something, which leads me to believe that RPI is a good measure of schedule quality:
      • When the two are similar, it tells me this may be the result of similar formulas using arbitrary weightings, but its still close to the correct schedule quality.
      • When the two are far apart, it tells me this team's played a very weak schedule, and the true quality of their schedule may be somewhere in between.
      • When the two are inverted, it tells me this team played a tough schedule, but the team didn't do enough with the schedule to warrant tournament consideration.
    • If I had another educated guess, I would say that Kevin Pauga also saw the "reliable but not valid" characteristic of the RPI. Of the four advanced statisticians invited to the meeting, his ranking system is the only one that was results-based like the RPI. On his site, Kevin states that he has corrected the flaws of existing results-based rankings to make it a better measure of team quality (rather than what it was measuring).
"To infinity, and Beyond!!!"***

When considering the impact of the Selection Committee using advanced metrics in the selection and seeding process, the two major questions that must be answered are these:
  1. What do we do when the selection committee knows what we have known for over a decade?
  2. Will it impact the 2017 tournament and how?
It is fair to say that some bracket prediction involves exploiting the knowledge gap between results-based rankings and prediction-based rankings. The seed curve is a prime example of this exploitation, when all tournament teams' predictive rankings are charted according to their seeds. When we see upward peaks in the curve at particular seed numbers, we know some really good teams did not receive a seed reflective of their true quality. As a result, we tend to expect these really good teams to pull off a few upsets. Another strategy that would be hindered by the use of predictive rankings is the over-seed/under-seed strategy.  In particular first-round match-ups involving an over-seeded team versus an under-seeded team, the over-seeded team would be predicted to lose and the under-seeded team predicted to advance. Typically, a tournament will have about three OS/US match-ups, so that's three games that would become a little bit tougher to pick. In the simplest terms, our ability to find easy bracket picks exploiting the knowledge gap would be curbed because the seeds would reflect more knowledge (or at least more accurate knowledge) of a team's true quality that isn't typically described by its wins and losses.

The second question is probably even harder to answer because of a different knowledge gap. While we have and use knowledge that more accurately assesses a team's quality, we do not have knowledge of exactly how the Selection Committee is assessing team quality. Are they still using RPI-laced Nitty Gritty team sheets? Will Jim Schaus (the Selection Committee member who attended the meeting) implement advanced metrics in his individual assessment of the teams? The whole focus of the meeting was to better select and seed the tournament, so even if Jim Schaus doesn't factor in the advanced metrics, I am certain he came out of the meeting better equipped to select and seed teams. Here's my final educated guess in the article and it may be the most consequential, but if there is going to be an effect of advanced metrics on the 2017 NCAA Tournament, you will see it in these three areas:
  1. The selection of the play-in game teams. Since the play-in teams represent the last teams in the tournament with at-large bids, if the differentiating factor between these four teams and the First Four Out teams is better predictive ratings rather than better RPI ratings, then it is a sign that advanced metrics may have influenced 2017.
  2. The 2016 Template. In this article under the section "The Mind of the Committee matters," I detailed my interpretations of how the 2016 bracket was put together. I pointed out that a factor like conference affiliation may have received undue importance when seeding teams 5-12 and, hence, may have contributed to the 8 first-round upsets we saw in 2016. If we see the lower teams from the ACC and the B12 being drastically over-seeded by conference affiliation in  2017 like the P12 and the B12 was over-seeded in 2016, then we know the Committee is possibly using the 2016 template to selecting and seeding teams and less likely to be using advanced metrics.
  3. The OS/US gap. In the past three tournaments (2014-2016), there have been several instances of OS/US match-ups, where one team is over-seeded and the other team is under-seeded. In these situations, the spreads have been significantly wide. For example, #8 UK played #9 KSU. The OS/US model suggested UK was actually closer to a 5-seed (a gap of 3 seeds) and #9 KSU was closer to an 11-seed (a gap of 2 seeds). The OS/US strategy would pick UK in what should have been a toss-up game under normal 8v9 circumstances. In that same year, #6 MASS played #11 TENN (following TENN's win in the play-in game). The OS/US model suggested MASS was closer to a 13-seed (a gap of 7 seeds) and #11 TENN was closer to a 4-seed (a gap of 7-seeds). The OS/US strategy would pick (probably even sound the alarms for) TENN. If 2017 is influenced by the use of advanced metrics in the selection and seeding processes, we will see a lot less of the OS/US match-ups, and the OS/US match-ups that we do see will most likely be within 1-seed line of their suggested seed. One word of caution: I stated this one last because I feel it would be the least reliable of the three (in other words, the first two would better confirm the use of advanced metrics than this one).
As always, thank you for reading, and I plan on having another article ready for next Wed Feb 22.

Sources:
*http://www.ncaa.com/news/basketball-men/article/2017-01-12/college-basketball-ncaa-tournament-selection-process-involves
**http://www.collegerpi.com/rpifaq.html
***Quote by Buzz Lightyear in the movie Toy Story.

No comments:

Post a Comment