Project: Perfect Bracket: Return and Improve Model

Too Long/Didn't Read: The introduction is a very long story on how I came up with the idea of Returning Players and Improving Tournament Performance. It may waste your time! This will also be a rather quick article because I'm going to do significant back-testing to this model (and many others) over the summer and present the content in the 2017-18 season.

INTRODUCTION TO RETURN AND IMPROVE

This is probably the largest project (in terms of data content and duration) that I have ever taken on when it comes to basketball statistics. It's been about three years in the making: Collecting data, verifying data, correctly copying/pasting data, and analyzing (or otherwise trying to make sense of) the data (though this is still an on-going process). When observing and studying NCAA tournament history, I noticed a pattern with teams that return players from a previous season and how it impacts their performance in the tournament. The first glimpse came with the 2006-2009 North Carolina Tarheels. For those not familiar with UNC at that time, the 05-06 team featured the winning-est class in UNC history with the likes of freshmen Tyler Hansbrough, Danny Green, Bobby Frasor, and Marcus Ginyard. That team won one tournament game in the 2006 tourney, but more importantly, it returned every player (all of those freshman) while only losing two seniors (David Noel and Byron Sanders). With the addition of another incredible freshman class (Ty Lawson, Wayne Ellington, Brandon Wright, Deon Thompson, and Alex Stephenson) to all of the returning players, the 2006-07 team won three games (Elite 8) in the NCAA tournament, improving their win total by two. (Let's not forget, at the same time, FLA returned everybody from their 2006 National Championship team and won it all again in 2007). The 2006-07 team only lost senior Reyshawn Terry and freshman Brandon Wright, returning all other members. With all of those returning players, the 2007-08 team won four games (Final 4) in the NCAA tournament. The only loss from this Final Four team was back-up center Alex Stephenson. All other players returned along with the addition of talented freshmen Tyler Zeller, Ed Davis, and Larry Drew II. With all of those returning players, UNC won six games to become the 2009 National Champion. With each team having a significant portion of players "Return" from the previous season, they were able to "Improve" upon their NCAA tournament performance from the previous season. Once I saw this over the four years, I started looking for other teams that returned and improved.

Kentucky went to the Final 4 in 2011 only to return key players and improve to National Champions in 2012.
Louisville went to the Final 4 in 2012 only to return key players and improve to National Champions in 2013.
Florida went to the Elite 8 in 2013 only to return key players and improve to a Final 4 in 2014.

This list could go on and on, but it was about this time I decided to make a database to see if I could determine what stats were important when trying to achieve this Return and Improve Phenomenon. If I could predict a Final 4 run based on a team going to the Elite 8 the previous year and returning a bunch of key players from that team, it seemed like a no-brainer to investigate these patterns for an easy four picks (and 15 points) in bracket challenges. If you are still reading this narrative on my ten-year thought process, this is where I begin presenting the data.

METHODOLOGY

As always, I must present a quick notation of my methodology in case you want to replicate (or even correct) my work.

RETURNING: A player is considered a returning player if they compiled stats in the first season, and then played (or in the case of reserves, was available) for the first game of the NCAA tournament in the second season.

Allonzo Trier from ARI: He would qualify as a returning player from the 2015-2016 ARI team because he is playing in the first game of the NCAA.
Maurice Watson Jr from CREI: Although he played in 2015-16, since he is lost for the remainder of the 2016-17 season, he is not counted as a returning player.
In some cases, a player returns from an injury later in the tournament (due to a deep run in the current year), this player is not counted because his status was "out" at the time of Bracket Crunch Time. (There was a player from WVU named Joe Mazzulla that fits this scenario and he was not counted as a returning player).
In some cases, a key player is injured during the NCAA tournament, and this would affect a team's chances of "improving", however he is still counted as a returning player because injuries in the NCAA tournament cannot be predicted during Bracket Crunch Week. (Example: 2011-12 UNC Kendall Marshall)

IMPROVING: The tests used for improving is wins in a given tournament. The only thing that matters in improving is winning at least one more tournament game than you did the previous year. In this model, going from 0 to 1 is the exact same thing as going from 0 to 4 and the exact same thing as 3 to 6.

-1 = Play-in Game Loss
0 = R64 Loss (also includes play-in game winners that lose their following game)
1 = R64 Win / R32 Appearance
2 = R32 Win / S16 Appearance
3 = S16 Win / E8 Appearance
4 = E8 Win / F4 Appearance
5 = F4 Win / NC Championship Game Loss
6 = National Champion
In the case of repeat champions like 2006-07 FLA, you cannot improve on 6 wins, so 6 wins in the second year qualifies as improvement.

OTHER PARAMETERS: The teams under study include only power conference teams (ACC, B10, B12, P12, SEC, and former members of the now-defunct BEC regardless of their new conference affiliation). Teams like GONZ, WICH, and BUT, though they were perennial participants in the NCAA tournament, are excluded for the time being.
Data primarily goes back to the 2001-2002 season for teams participating in the 2002 Tournament and returning to the 2003 Tournament. Some teams (no more than 5) had data for earlier years and these may be included. In next year's report, it will be limited to 2001-02 season for consistency purposes.

FINDINGS

Finally, I get to the meat of the study. The observations described in the introduction are not random events. There is a certain probability of improving your tournament performance from the previous year depending on the percentage of certain stats that return, and it is summarized in the table below.

Let's quickly run through the notations in this table.

80R# = Number of teams that returned more than 80% in the particular stat column. For example, 42 under GS means "42 teams returned more than 80% of their game starts.
80R&I# = Number of teams that returned more than 80% in the particular stat column AND improved their tournament performance the following year. For example, 28 under GS means "28 teams returned more than 80% of their starts and improved their win total from the previous year's tournament.
80R&I% = 80R&I# divided by 80R#. For example, 66.67% of teams that returned more than 80% of their Games Started (GS) saw an improvement in their tournament performance the following year.
70 means greater than 70% but less than or equal to 80%.
60 means greater than 60% but less than or equal to 70%.
50 means greater than 50% but less than or equal to 60%.

Let's run through the stats.

The obvious pattern of trend: The less you return of a particular stat, the lower your chances of improving your tournament performance.

Logically speaking, we are talking about continuity, chemistry and experience when measuring the return of these statistics. Not only that, the vast majority of college players tend to improve their individual stats year-over-year. So the team as a whole may be returning 70% of their stats, but in reality, they are returning more than that when you factor in another year of growth and development of all those returning players.
Intuitively, this chart also explains why basketball 'programs' like PITT under Jamie Dixon returned every year to the tournament, but failed to improve on the previous year's performance. When you lose two to three key seniors every year, you are constantly losing 40-60% respectively of the previous team's continuity, chemistry and experience.

The not-so-obvious pattern of trend: Some stats matter more than others (and some I have no clue what they mean, but I felt like they needed to be included in the data set anyways).

To begin with, I have no clue if returning Fouls or returning TOs even matters. I would assume those percentages would go down as players get better the next year. For the most part, I believe we can ignore them.
Second of all, returning rebounds (and returning offensive rebounds) is distorted due to a stat known as "Team Rebounds." This stat usually adds 100 rebounds to a team's total, yet no individual gets credit for these rebounds. In effect, it inflates the team's total (the denominator in the calculation), thus making the individual's return percentage lower than it should be. It is why very few teams qualify for returning 80% of rebounds: The total quantity of qualifying teams is in the teens whereas all other stats are 30s and 40s.
Blocks is kind of the same deal as rebounds because they are usually concentrated in one or two players. If you return your shot block specialist(s), you return 40% (sometimes as high as 70%) of the team by simply returning that one (or two) player(s).
For this article, I will focus on minutes returned and points returned. Simply put, minutes returned suggests experience coming back and points returned suggests probable wins coming back (you need points to win). How these points are achieved (FTM, 3PM, etc) seems to be more erratic and less correlated to returning and improving.

One final note on the percentages: They do not factor critical match-ups. A critical match-up is where two teams cannot practically improve because of bracket positioning. For example, let's suppose two tournament teams in 2016 won one game. Then in 2017, they each returned 70% of all their stat categories, yet they get paired in the same pod (one is a 4-seed and one is a 5-seed). It is practically impossible for both to improve their tournament performance as one team is guaranteed to lose in a match-up against each other. Critical match-ups automatically reduce the actual success percentage. In the percentages above, they are all technically higher than they are listed because critical match-ups have not been factored into the totals. I would estimate that critical match-ups happen once every ten games. So for every ten teams that qualify for a specific R#, you can add one win to the corresponding R&I#, which would slightly boost the R&I% of that percentage group. When I back-test the data-set this summer, I will account for critical match-ups (as well as a few other anomalies).

APPLYING THE RETURN & IMPROVE MODEL TO 2017

Before we jump into the return numbers for 2017 teams, I added (directly above) one more return percentile group, and when you see this year's crop, you'll understand why. It is the number of teams that returned 90% or more of a particular stat (all the technicalities listed for the 50-80 percentiles still apply). The 80 percentile group in the other table also includes this group of teams, so if you want to know the 80-90 percentile group, just subtract the 90 percentile group from the 80 percentile group, and you will get that specific range. Now let's look at the 2017 teams and what they have returned from each stat column.

While you are more than welcome to study this table in the remaining hours of Bracket Crunch time, I thought I'd highlight the gist of this.

First, there is a brand new column labelled W. That stands for 2016 Tournament Wins. If these teams are to improve, they have to get at least one more win than what is currently listed in the W column. VAN does not have to do a thing. Last year, they lost in the play-in game, which is -1 wins. By simply making the tournament field (without a play-in game), they have already improved over last year's tournament performance. (This may be something I have to account for when I back-test over the summer).
Second of all, there are a crap-ton (very statistical terminology) of teams in the 50-percentiles and 60-percentiles. Unfortunately for these teams (and depending on the stat), only a small percentage of these teams will improve. By the chart using either points or minutes, somewhere between 29-42% will improve on their tournament performance from the previous year. With 15 teams returning minutes in the 50- or 60-percentiles, 4.35 to 6.3 teams will improve, which is why I said in the 1-seed tournament resume that it will be extremely difficult for NOVA to repeat. We already know one team in the 50-percentile in minutes that will not improve, and I'm talking about PROV because they lost in the play-in game to USC (although it was a critical match-up described above).
There are a few teams in the 70-percentiles in a few stats, some of which are in 3PM and 3PA. According to the table of averages, even returning 70% of these statistics only warrants a 36-40% chance of win improvement.
The big one (and the one everyone probably wants all of the details) is WISC. Across the stats board, they are returning percentages in the high-90s (this is probably true for rebounding too if not for the influence of team rebounds described above). The percentages are definitely higher than other groups (and don't forget, the 80+ percentile has the 90-percentiles included in it, so the 80-90 range is most likely lower than is shown in the 80+ percentile in the chart.

I looked back at the four teams that returned 90%+ of their minutes but failed to improve their tournament performance, and they are 2001-02 OKST, 2002-03 PITT, 2004-05 WAKE, and 2014-15 TEX.

2001-02 OKST lost a "critical game" to Kent St (if mid-majors were included)
2002-03 PITT lost to Dwayne Wade's MARQ after two wins when they recorded two wins in the previous year. (No qualifying exclusions found.)
2004-05 WAKE lost to Kevin Pittsnogle's (and John Beilein's) WVU after one win when they recorded two wins in the previous year. (No qualifying exclusions found.)
2014-15 TEX lost to BUT after 0 wins when they recorded one win in the previous year. One qualifying exclusion that I will be investigating over the summer is seed differential. 2014-15 TEX, although returning 90% in most categories, received a 11-seed that year, when they were a 7-seed the previous year. It is hard to return and improve your tournament performance if you don't actually improve the next season (receiving a lower seed than the year before). It's just a theory I want to test, and ironically, it's a condition that affects 2016-17 WISC, as they have an 8-seed this year compared to a 7-seed last year.
As for the ten teams that succeeded in returning and improving, here are the splits:

Five needed to improve from 0 to 1: Three got the one win (07-08 ARK, 07-08 MARQ & 11-12 VAN), one got two wins (07-08 STAN), and one got four wins (04-05 MIST).
One needed to improve from 1 to 2: They got six wins (04-05 UNC)
Three needed to improve from 2 to 3: Two got four wins (02-03 TEX and 08-09 NOVA) and one got five wins (04-05 ILL). This is the same situation for WISC: they need at least three wins to improve upon last year's two.
One needed to improve from 6 to 6: They got six wins (06-07 FLA)
Only 11-12 VAN didn't improve its seed from the previous year (5 to 5), but they still managed to improve their wins (detailed above).

The four failed teams (detailed above) plus two more (2007-08 TENN and 2009-2010 KU) qualified for returning at least 90% in points but failed to improve their tournament performance.

Both received higher seeds than the previous year (TENN from 5 to 2 and KU from 3 to 1), so that exclusion does not apply.
Both did not lose in "critical games", so this exclusion does not apply.

TENN faced LOU, but LOU had already achieved improvement status with two wins compared to the previous year's one
KU faced UNI, but UNI had already achieved improvement status with one from the previous year's zero wins.

As for the ten teams that succeeded in returning and improving, here are the splits.

Six needed to improve from 0 to 1: Three got the one win (07-08 ARK, 07-08 MARQ & 11-12 VAN), two got two wins (04-05 WASH and 07-08 STAN), and one got four wins (04-05 MIST).
One needed to improve from 1 to 2: They got six wins (04-05 UNC)
Two needed to improve from 2 to 3: One got four wins (02-03 TEX) and one got five wins (04-05 ILL). This is the same situation for WISC: they need three wins to improve upon last year's two.
One needed to improve from 6 to 6: They got six wins (06-07 FLA)
Only 11-12 VAN didn't improve its seed from the previous year (5 to 5).

The final fact that I will leave you with for 2016-17 WISC: To get the three wins, they must play a "critical game" against NOVA, who is trying to improve upon their six wins from 2016 while returning 61.3% of their minutes and 62.8% of their points.

As you can see, this model is going to be a lot of fun once I do some more work over the summer on it. Since the goal of bracket picking is to return and improve each year, it's only fitting that a study like this exists. I hope you enjoyed reading, and I hope everything turns out for your bracket in 2017.

Project: Perfect Bracket

Mar 16, 2017

Return and Improve Model

No comments:

Post a Comment

Mar 16, 2017

Return and Improve Model

No comments:

Post a Comment

Subscribe To