Dec 1, 2019

New Metric: Free Throw Advantage

Over the summer, I spent a lot of time reflecting over the Four Factors model of college basketball. If you have ever read the book Basketball on Paper by Dean Oliver, then you know what I'm talking about and it is probably a good explanation as to why you are reading this blog. One of the big concerns I have always had with the Four Factors model is the Free Throw Rate (FTR). In fact, most advanced metrics analysts, including Dean Oliver, have conceded that FTR is the least significant of the Four Factors. Since I always want my understanding of the game and the numbers to be at the highest level, I sought a different solution to the Free Throw Rate component, which brings us to this article. I'll start with a crash-course on advanced metrics, then elaborate on the details of the FTR, then introduce my new concept of Free Throw Advantage.

Nov 16, 2019

Warming Up the Crystal Ball: 2019-2020 Edition

In the inaugural article, I promised an article that I had finished before it because I did a lot of the work for it over the summer. If you know me by now, you know that this article is not the promised article. I do intend to auto-publish that one at the beginning of December, which actually aligns better with my work schedule since I won't have a lot of time during those two weeks to put anything together. Instead you get this article, which has become sort of a standard for PPB and it does have a lot of valuable information in it for March. By putting my pre-season biases into the written record, it lets you the reader see my point of view from the start and see how the data is being used to confirm or reject my point of view. Likewise, I would like to publish my preseason thoughts before any other headline-grabbing upsets happen. Nonetheless, here's my outlook for the 2019-2020 season.

Nov 2, 2019

The 2019-2020 College Basketball Season

Welcome to the 2019-2020 College Basketball Season on Project: Perfect Bracket. There's a lot to discuss going forward, so this introduction will be brief in order to elaborate on the more important stuff. First, I will post a copy of my final 2019 predictions that I wrote in the "To my readers" section (due to a lack of time in writing a third full article during Bracket Crunch Week), then I will grade my 2019 predictions, and I will finish with a section on the unpredictable nature of PPB for this season.



2019 Final Predictions - Copy and Paste

Final Predictions:
Summary: Chalky in terms of 2011-2018 Definition of Chalky
Best Tournament Models: 2015 and 2003
2015 AM: 179 - 70 - 21 - 10 - 2 - 1
2003 AM: 174 - 67 - 19 -9 - 5 -3
These are actually good targets. Both SCs are similar, although the 2015 QC matches 2019 far closer than 2003 QC. 2019 is stronger in the middle and back than 2015. The 2019 SC is tight with the 2019 QC, which I theorized last year to minimize R64 upsets. I'm going to follow this again.

Here's where $#!T will hit the fan.
2015 UBR Model: 4-3-1-0-0-0
2003 UBR Model: 3-3-0-0-0-0
2019 PPB Projection: 4-3-1-0-0-0
I've tried every way possible to rule out confirmation bias and failed. I have no choice but to like these numbers! I honestly feel like R64 will be less than 4 upsets, so don't go too crazy in R64

Other Predictions:
-- I like SYR to win at least 1 Game and TENN to win at least 2 games via R&I Model. I think those are safe picks.
-- I like all 1-seeds and 2-seeds to defeat their R64 counterparts via SC. No surprises here! The predicted four upsets must come from 3- thru 6-seeds (although I wouldn't blame you if you went less than four).
-- No more than two 1- or 2-seeds lose in R32 via SC/QC. To get 3 upsets in R32, one of the 11- thru 14-seeds must pull double duty.
-- Do not move NOVA past S16 via Tourney Profiles. Only one reigning NC has won a game after S16 round (2007 FLA, enough said).
-- Final Prediction: No Play-in Winner advances to R32 for first time ever.

Grading the Predictor

The obvious place to start is the section above, just so you, the reader, don't have to constantly scroll up-and-down to match grade and the prediction.
  • AM Targets: The actual 2019 AM was 192-49-18-11-4-1. The E8, F4, NR and NC rounds were exactly where your bracket needed to be. If you are going to win a bracket contest, those are the more valuable rounds, so the AM predictions would have raised your odds tremendously. However, the R32 and S16 rounds were miles away. This could have resulted picking an unfavorable upset. As a result, I grade this prediction around A-/B+.
  • UBR Model: The actual 2019 UBR count was 5-0-1-0-0-0 for six total upsets. This matched the final upset total of 2003, which was also six upsets, but 2019 achieved those six upsets in a different fashion than 2003. To my credit, I mentioned the "2019 SC matched the 2015 SC but 2019 was stronger in the middle and back". However, I failed to understand the impact it would have on the UBR Model (and the AM Targets for that matter). It is probably a good explanation as to why the 2019 UBR Model bent away from 4-3-1-0-0-0 and towards 5-0-1-0-0-0. All in all, I would probably give a grade of B- or even C+ because it was off. I'm not sure how much a lack of sample size or confirmation bias played a role, but I had a feeling it would go off the rails on the UBR model when I said "$#!T would hit the fan."
  • Other Predictions: I loved the strength of the 2019 1-seeds and 2-seeds (and I did state this earlier in the year and often throughout). I was stupid for even suggesting that a 7-10 seed could upset any of them in R32 (although a really strong 7-seed WOF came close to upsetting a 2-seeded UK with a significant injury). Nonetheless, this prediction was an F, plain and simple. "NOVA not advancing past S16" was a perfect guideline, and considering the only reigning NC to achieve this feat was the 2007 FLA team (returning 97% of its 2006 NC team), it is probably a guideline I will use for a long time to come. I give the guideline an A+. "No Play-in Winner advances to R32 for first time ever." GOLDEN!!! I didn't like their 6-seed match-ups (MARY and BUFF) in the R64, I didn't like the two teams (AZST and BELM) that won the play-in game, and I thought both teams won their play-in games too easily against two teams (JOHN and TEM) that I wouldn't have chosen for the play-in game. The gambler in me said fade them both, and that is what happened, but to BELM's credit, they did make it close against MARY.
I skipped over one of the "other predictions" so that I could go into more detail on the model involved in the prediction -- the Return and Improve Model. The TENN prediction to win 2 at least two games worked perfectly. The SYR pick was a different story. If you read the article (Link) on the 2019 R&I Model, it showed SYR returning the highest percentages of any team eligible for R&I consideration. It also went into far more detail about SYR than my final predictions did. The issue with the SYR prediction is lack of information. I honestly did not know about the suspension of starting PG Frank Howard (which happened on Wednesday) until after the games had already started on Thursday, and by then, it was way too late to do anything about it. When you take Frank Howard out of SYR's Return percentages, it drops from 90.72% to 71.8% for MINS and from 93.71% to 72.1% for PTS. According to the adjusted Howard-less numbers, SYR goes from being a 66% chance to return and improve (win at least three games) to slightly less than 50% chance. I also didn't like SYR's path to achieve R&I so I called an audible to just 1-win. The knowledge of Howard's suspension would have changed my math completely. However, I still have to give myself an F on that prediction with the caveat that it was due to incomplete information.

As for the R&I Model itself, I took a different approach to implementation in 2019 -- an almost Bayesian-Probabilistic Approach. I actually liked this approach. It identified a lot of freebies (UVA and VT for at least 1 win, UNC and TENN for 2 wins, and FLST, KNST, KU, and NOVA as highly likely fails), it presented a lot of opportunistic probabilities (AUB and PUR), it identified a lot of high-probability traps (HALL, FLA, CIN, OHST, and GONZ), and it only missed outright the TXTC run to the title game.  Disregarding the incomplete information involving SYR, the R&I model predictions could be given somewhere around a B+/B grade.

As for the Final QC Analysis article, this one is a real head-scratcher. It identified the 6v11 match-ups to be pretty safe for the 6-seeds, and three of the four won their match-up (ironically, the hottest 6-seed coming into the tournament was the one that lost to a power-conference 11-seed who hadn't beat a tournament team since November). It also identified 2-seeds, 5-seeds and 7-seeds as areas of strength. 2-seeds combined collected eleven wins, which is one win shy of expectations based on top-seed advancement (four E8 appearances equals twelve total wins). It was two wins better than 2018's crop of 2-seeds, which was far weaker than 2019's crop. 5-seeds collected four total wins, which matches expectations based on top-seed advancement (four R32 appearances equals four wins), but all four wins were collected by one under-seeded 5-seed advancing to the F4. The "spike at the end of the SC (the 12-seed group)" should have been an indication of potential 5-seed victims. 7-seeds collectively won one game, and this prediction probably shouldn't have been made considering that the 10-seed group was in-line on the QC-SC overlay. This was a clear oversight by yours truly. All in all, there were some hidden gems in this analysis and one clear oversight. As a result, I give myself a B+ on these predictions.

Finally, there was one prediction that never made the blog that I wanted to discuss. Last year, I introduced a new tool called the Seed-Group Loss Table (Part 1 and Part 2). In Part 2, I created a linear regression model for each of the top four seed-groups to predict their F4 and E8 potential. I posted the formulas if you wanted to try them for your 2019 bracket, but I did not cover them as official models for the 2019 predictions. I WISH I WOULD HAVE! Using the linear regression formulas and the L% and N/L% for each seed-group, the SGLT predicted for
  • 1-seeds: 0.227 for the F4 (either zero or one) and 2.433 for the E8 (either two or three).
  • 2-seeds: 0.864 for the F4 (either zero or one) and 1.786 for the E8 (either one or two).
  • 3-seeds: 0.9305 for the F4 (either zero or one) and 1.619 for the E8 (either one or two).
  • 4-seeds: 0.3252 for the F4 (either zero or one) and 0.573 for the E8 (either zero or one).
Each of the estimations was correct. For the F4, there was one 1-seed, one 2-seed, one 3-seed, and one 5-seed. For the E8, there was three 1-seeds, two 2-seeds, two 3-seeds, and zero 4-seeds. I really wish I had extended the SGLT beyond the top four seed-groups to see how it would have fared in predicting 2019's F4 appearance by a 5-seed. Unfortunately, the SGLT tends to lose accuracy for deeper runs when extending it to lower seeds. As the SGLT article itself suggested, it is probably best for predicting how the seed-group performs against its seed-group's expectations (F4 appearances for 1-seeds, E8 appearances for 2-seeds, S16 appearances for 3- and 4-seeds, R32 appearances for 5- thru 8-seeds). It worked well enough that I will probably include it in 2020's predictions.

PPB for the 2019-2020 Season

If I had to make a prediction in November about this blog for B.C.W., it looks highly likely that I will produce one total article and hope I can be both precise and concise with the all of the details. As for the yearly schedule, it is up in the air. For the last two years, I published articles on Monday morning right at midnight. It looks like Saturday or Sunday may be the best bet, and it may be monthly articles instead of bi-weekly. In past season-opening articles, I would lay out a schedule with likely targets for publish dates. I don't think I'm going that route this year because it may result in over-promising and under-delivering on my part. The only guidance that I can give at the moment is I will prioritize QC Analysis articles above all else, followed by articles on new bracket-picking models or improvements to existing bracket-picking models, and lastly any articles with opinion/feedback/criticism and/or history-driven articles. Believe me, I would love to give my two cents' worth on the stupidity of the expansion of power conference schedules, and if you don't know what I am talking about, just look at how many ACC teams are playing conference games in the first week of November. What a joke!!! To finalize the issue of article scheduling, I will more than likely improvise the schedule this season and have a better understanding of my schedule for the following season (2020-21). I do know the contents of the next article because I wrote it before I wrote this one. I just don't know what date I'm going to set for the auto-publish. Anyways, I'm looking forward to another year of trying to accomplish a lifelong dream, and I hope you will join me for the ride.

Mar 20, 2019

Return and Improve Model - 2019 Edition

One of my pet projects for numerous year is back for another go-around. 2018 wasn't kind to the R&I model, and if I remember correctly, 2017 fudged a lot of the probabilities (I honestly can't remember, so here are the links if you are interested: 2017 and 2018). If you are unfamiliar with the tenants of the R&I model, the 2017 link will explain it in great detail. For the sake of time, I will be as brief as possible. The R&I model looks at the percentage of multiple statistical categories that a team returns from the previous year's team. It then forecasts the probability of the current year's team improving its tournament performance compared to the previous year's team based on the return-percentage. Since the model seems to have less and less applicability in this current era of one-and-done college basketball, this model and its probabilities have not been "qualified" or "scaled" based on any extenuating factors, such as critical match-ups, seed differentials, or era. Let's see what 2019 holds.

Mar 19, 2019

2019 Quality Curve Analysis - Final Edition

Well, the bracket has been released, the match-ups are set, and now it is time to see how the next three weeks will likely play out. If you have read the previous three editions (and if you haven't, here are the links to three very good reads: Jan, Feb and Mar), you will recognize the chart below. It is the Final 2018 Quality Curve.


From the looks of it, there's a lot to talk about, so let's see what we can learn.

Mar 11, 2019

Bracket Profiles

After finishing the two articles on the seed-group loss table (Links to both: Part 1 and Part 2), I wondered if there was other data points that could be used to build a tournament profile. The SGLT presented a loss-share percentage (the L% stat) for seed-groups 1 through 12 and a seed-group loss-percentage against non-tournament teams (the N/L% stat). For this article, I want to present two additional data sets for returning teams. You can think of these two data sets as a return-and-improve model without the focus on the "-and-improve" stipulation. So if you like data-driven articles, you will love this one. If you're interested in bracket-picking models and pattern analysis, you may want to skip this one. With that warning out of the way, let's dive right into it.



Returners by Seed-Group

As the heading would suggest, this data set looks at the tournament field by seed-group for a given year and tallies the teams in each seed-group that went to the previous year's tournament. For reference, this count only looks at the Field of 64 (teams that fail to advance out of the First Four games are not counted as tournament returners).


Since I've only been looking at this data for about two weeks, I haven't had the time to fool-proof my excel formulas to precisely see how significant returners are to seed-group performance. Instead, I'll focus only on the areas that I think bracket-pickers would want to know: The seed-expectations for 1- and 2-seeds, and the R64 upset-potential of 5- and 6-seeds.
  1. Since the data doesn't count First Four losers (First Four began in the 2011 tournament), let's start there. I find it very ironic that the addition of four teams into the field hasn't improved the totals of returning teams. Only two years from 2011 to 2018 have produced more than 33 returning teams. From 2002-2010, six of the nine years in that span produced more than 33 returning teams. It could be a contributing factor to the relatively sanity of those nine years when compared to the eight years of the First Four's existence.
  2. The pattern that immediately catches my eye is the 1-seed returners. There are only three years (2014, 2010, and 2006) in which one of the 1-seeds didn't go to previous year's tournament. Those three years produced some of the wildest tournaments we have seen. When taken into the context of the groupings in Point #1, 2006 and 2010 produced the highest M-o-M ratings in the 2002-2010 era, and 2014 produced the highest M-o-M rating of all tournaments in this chart. 2006 failed to produce a 1-seed in the F4 (the other two only produced one each). Think logically about this: If a team can fail to qualify for a tournament in one year and then achieve a 1-seed in the following year, the specific team probably did so against a weak field in the latter year (which is indicative of a crazy tournament). All three of those 1-seeds that failed to reach the tournament in the previous year (UVA, UK, and MEM) bowed out before reaching the F4 (seed expectations for a 1-seed), and two of those teams had the same head coach.
  3. As for 2-seeds, there seems to be only two outliers: 2002 (one returner) and 2006 (two returners). Ironically, both years produced a 2-seed in the F4, which happened to be returners. Both years also saw a 2-seed fail to reach the S16, which happened to be non-returners. In a even stranger twist, 2002 with its sole returner produced three 2-seeds in the E8 (which matches seed expectations for 2-seeds). Looking at their first-round counter-parts, the only three years (2012, 2013, and 2016) in which a 15-seed knocked out a 2-seed had zero returners for the 15-seed group. This makes 15-seeds 3-for-6 when none of the teams went to the previous dance with all other years producing at least one 15-seed returner.
  4. Tournaments typically produce anywhere from two to four returners in the 5-seed group. Only three times (2015, 2009, and 2007) has a tournament had less than two returners, and contrary to logical assumptions, two of those three years (2007 and 2015) produced four 5-seeds advancing past their historical nemesis (12-seeds).
  5. 6-seeds typically have two or three returners in their group. Of the four years (2015, 2013, 2007, and 2004) when 6-seeds produced one returner, they won at least two of their match-ups against their 11-seed counterparts. When the tournament reached historical levels of sanity in 2007, the only two R64 upsets in that year happened to 6-seeds.
Returners by Tournament Performance

The data set below presents a tournament's profile by counting the number of teams from a previous tournament's round if they qualified for the current tournament. For example, the 2018 tournament featured the 2017 National Champion (NC), the 2017 National Runner-up, only two F4 teams (which happened to be the NC and NR of 2017), six E8 teams from 2017 (again, the only two missing are the two F4 teams that failed to return), and eleven of the S16 from 2017 (the two failed-returners of the F4 and three teams that made the S16 but lost their next game).


While I haven't had time to dig deeper into the patterns of this table, I would definitely claim this data set to be the oddest of the two. For example, National Champions have returned to the tournament in the following year every time except on four occasions. In two of those four "non-returning NC" years, at least three 1-seeds made the F4. In the one year in which the previous NC and the previous NR did not return, it was the same year in which all four 1-seeds made the F4. I'm certain it is 100% coincidence. While I doubt this data set will turn into any useful bracket-picking model, I think it may be useful in formulating bracket-picking strategies to win your office pool instead of "How to get every pick right" like this blog aspires to do.

Post-Selection Show Addendum

Before you look at the chart below, one detail must be noted. Even though AZST and NCCU have play-in games in 2019, neither team won their play-in game in 2018. Since this method only looks at the field of 64 for year-to-year consistency, it does not matter what they do in their 2019 play-in games as neither was in the field of 64 in 2018.


I will state again for the record that I have no idea what this means or how to use this information as I have yet to fully back-test this method. For comparison purposes, the 2019 tournament profile looks a lot like 2004, except it has two less returning teams and one extra returning E8 and S16 team.

I hope you enjoyed this venture into tournament profiling. Even though we find many similarities between tournaments and the participating teams, it does seem like no two years are the same. These two profile sets seem to confirm that no two tournaments are the same since no two profiles are the same. As always, thanks for reading my work, and next Sunday starts Bracket Crunch Week. I just hope I'm ready and prepared for 2019!!!

Mar 4, 2019

2019 Quality Curve Analysis - March Edition

Well, the calendar has rolled over to another month, and at PPB, it means it is time for another installment of the Quality Curve analysis, so let's see what's changed in the last month.

Feb 18, 2019

Seed-Group Loss Table, Part 2

In Part 1 of this article, I explored the Seed-Group Loss Table as a predictive tool. The point of the exercise was to find patterns between seed-group losses and tournament performance. Since I added more information to the article, I would highly recommend reading (or re-reading it if you read it the first time) because I discussed a few more concepts about the data. In this article, we're going to be looking at the same data points, but the focus will be upon methods instead of data. Let's explore!

Feb 4, 2019

2019 Quality Curve Analysis - February Edition

Yes, it is the start of a new month, and at PPB, it usually means a new update to the Quality Curve Analysis. If you want to read the January Edition, here is the link. Before I jump into the article, a few quick thoughts are in order. First, the QC made a huge shift, one I didn't see coming and one for which I most likely don't have a full explanation. Second, I said in the previous edition for a high-magnitude shift to occur, shooting would have drastically improve. Well, shooting did slightly improve, but not enough to explain the magnitude in shift, so other unexplained factors exist. Third, the shift did not affect every team across the board, and for those that experienced the shift, it doesn't appear to be at the expense of others in the QC. Fourth and final, I should definitely point out the advanced metrics data being used includes all games played on January 31 and before. With that said, let's dive right into the analysis.

Jan 21, 2019

Seed-Group Loss Table, Part 1

If the title of this article seems familiar, then you remember details of my blog way better than I do. I first introduced the idea of a Seed-Group Loss Table (SGLT) in the article on Unorthodox Bracket-Picking Methods. (NOTE: Re-reading that article is not required to understand concepts in this article. However, the instructions on how to construct a SGLT can be found in Step 1 of the Loss-Mapping Technique and examples of what a SGLT looks like can be seen in the bracket images). In the Loss-Mapping Technique, the SGLT did not serve any critical function other than providing a way to double-check my work (making sure the mapping adds up to the correct totals). Also in the LMT, the SGLT was presented in a tournament-dependent construct in which all of the information was relevant to the specific tournament year. In this article, I want to re-arrange the SGLT into a seed-dependent construct in which all of the information is relevant to the seed. The goal in doing this rearrangement is to find seed-specific patterns from losses (quantity of losses, quality of losses, etc.). Thus, this article's sole purpose is to explore the SGLT as a predictive tool. 

Jan 5, 2019

2019 Quality Curve Analysis - January Edition

First things first, Project Perfect Bracket would like you to join in celebrating its third birthday. Now that our birthday party is over (if only the articles were as long as the parties), we can move onto real issues: The first look at the 2019 Quality Curve. I warn you before proceeding, it is not pretty!