Over the summer, I spent a lot of time reflecting over the Four Factors model of college basketball. If you have ever read the book Basketball on Paper by Dean Oliver, then you know what I'm talking about and it is probably a good explanation as to why you are reading this blog. One of the big concerns I have always had with the Four Factors model is the Free Throw Rate (FTR). In fact, most advanced metrics analysts, including Dean Oliver, have conceded that FTR is the least significant of the Four Factors. Since I always want my understanding of the game and the numbers to be at the highest level, I sought a different solution to the Free Throw Rate component, which brings us to this article. I'll start with a crash-course on advanced metrics, then elaborate on the details of the FTR, then introduce my new concept of Free Throw Advantage.
A blog dedicated to predicting a perfect NCAA Bracket using systems of analysis.
Dec 1, 2019
Nov 16, 2019
Warming Up the Crystal Ball: 2019-2020 Edition
In the inaugural article, I promised an article that I had finished before it because I did a lot of the work for it over the summer. If you know me by now, you know that this article is not the promised article. I do intend to auto-publish that one at the beginning of December, which actually aligns better with my work schedule since I won't have a lot of time during those two weeks to put anything together. Instead you get this article, which has become sort of a standard for PPB and it does have a lot of valuable information in it for March. By putting my pre-season biases into the written record, it lets you the reader see my point of view from the start and see how the data is being used to confirm or reject my point of view. Likewise, I would like to publish my preseason thoughts before any other headline-grabbing upsets happen. Nonetheless, here's my outlook for the 2019-2020 season.
Nov 2, 2019
The 2019-2020 College Basketball Season
Final Predictions:
Summary: Chalky in terms of 2011-2018 Definition of Chalky
Best Tournament Models: 2015 and 2003
2015 AM: 179 - 70 - 21 - 10 - 2 - 1
2003 AM: 174 - 67 - 19 -9 - 5 -3
These are actually good targets. Both SCs are similar, although the 2015 QC matches 2019 far closer than 2003 QC. 2019 is stronger in the middle and back than 2015. The 2019 SC is tight with the 2019 QC, which I theorized last year to minimize R64 upsets. I'm going to follow this again.
Here's where $#!T will hit the fan.
2015 UBR Model: 4-3-1-0-0-0
2003 UBR Model: 3-3-0-0-0-0
2019 PPB Projection: 4-3-1-0-0-0
I've tried every way possible to rule out confirmation bias and failed. I have no choice but to like these numbers! I honestly feel like R64 will be less than 4 upsets, so don't go too crazy in R64
Other Predictions:
-- I like SYR to win at least 1 Game and TENN to win at least 2 games via R&I Model. I think those are safe picks.
-- I like all 1-seeds and 2-seeds to defeat their R64 counterparts via SC. No surprises here! The predicted four upsets must come from 3- thru 6-seeds (although I wouldn't blame you if you went less than four).
-- No more than two 1- or 2-seeds lose in R32 via SC/QC. To get 3 upsets in R32, one of the 11- thru 14-seeds must pull double duty.
-- Do not move NOVA past S16 via Tourney Profiles. Only one reigning NC has won a game after S16 round (2007 FLA, enough said).
-- Final Prediction: No Play-in Winner advances to R32 for first time ever.
The obvious place to start is the section above, just so you, the reader, don't have to constantly scroll up-and-down to match grade and the prediction.
- AM Targets: The actual 2019 AM was 192-49-18-11-4-1. The E8, F4, NR and NC rounds were exactly where your bracket needed to be. If you are going to win a bracket contest, those are the more valuable rounds, so the AM predictions would have raised your odds tremendously. However, the R32 and S16 rounds were miles away. This could have resulted picking an unfavorable upset. As a result, I grade this prediction around A-/B+.
- UBR Model: The actual 2019 UBR count was 5-0-1-0-0-0 for six total upsets. This matched the final upset total of 2003, which was also six upsets, but 2019 achieved those six upsets in a different fashion than 2003. To my credit, I mentioned the "2019 SC matched the 2015 SC but 2019 was stronger in the middle and back". However, I failed to understand the impact it would have on the UBR Model (and the AM Targets for that matter). It is probably a good explanation as to why the 2019 UBR Model bent away from 4-3-1-0-0-0 and towards 5-0-1-0-0-0. All in all, I would probably give a grade of B- or even C+ because it was off. I'm not sure how much a lack of sample size or confirmation bias played a role, but I had a feeling it would go off the rails on the UBR model when I said "$#!T would hit the fan."
- Other Predictions: I loved the strength of the 2019 1-seeds and 2-seeds (and I did state this earlier in the year and often throughout). I was stupid for even suggesting that a 7-10 seed could upset any of them in R32 (although a really strong 7-seed WOF came close to upsetting a 2-seeded UK with a significant injury). Nonetheless, this prediction was an F, plain and simple. "NOVA not advancing past S16" was a perfect guideline, and considering the only reigning NC to achieve this feat was the 2007 FLA team (returning 97% of its 2006 NC team), it is probably a guideline I will use for a long time to come. I give the guideline an A+. "No Play-in Winner advances to R32 for first time ever." GOLDEN!!! I didn't like their 6-seed match-ups (MARY and BUFF) in the R64, I didn't like the two teams (AZST and BELM) that won the play-in game, and I thought both teams won their play-in games too easily against two teams (JOHN and TEM) that I wouldn't have chosen for the play-in game. The gambler in me said fade them both, and that is what happened, but to BELM's credit, they did make it close against MARY.
As for the R&I Model itself, I took a different approach to implementation in 2019 -- an almost Bayesian-Probabilistic Approach. I actually liked this approach. It identified a lot of freebies (UVA and VT for at least 1 win, UNC and TENN for 2 wins, and FLST, KNST, KU, and NOVA as highly likely fails), it presented a lot of opportunistic probabilities (AUB and PUR), it identified a lot of high-probability traps (HALL, FLA, CIN, OHST, and GONZ), and it only missed outright the TXTC run to the title game. Disregarding the incomplete information involving SYR, the R&I model predictions could be given somewhere around a B+/B grade.
As for the Final QC Analysis article, this one is a real head-scratcher. It identified the 6v11 match-ups to be pretty safe for the 6-seeds, and three of the four won their match-up (ironically, the hottest 6-seed coming into the tournament was the one that lost to a power-conference 11-seed who hadn't beat a tournament team since November). It also identified 2-seeds, 5-seeds and 7-seeds as areas of strength. 2-seeds combined collected eleven wins, which is one win shy of expectations based on top-seed advancement (four E8 appearances equals twelve total wins). It was two wins better than 2018's crop of 2-seeds, which was far weaker than 2019's crop. 5-seeds collected four total wins, which matches expectations based on top-seed advancement (four R32 appearances equals four wins), but all four wins were collected by one under-seeded 5-seed advancing to the F4. The "spike at the end of the SC (the 12-seed group)" should have been an indication of potential 5-seed victims. 7-seeds collectively won one game, and this prediction probably shouldn't have been made considering that the 10-seed group was in-line on the QC-SC overlay. This was a clear oversight by yours truly. All in all, there were some hidden gems in this analysis and one clear oversight. As a result, I give myself a B+ on these predictions.
Finally, there was one prediction that never made the blog that I wanted to discuss. Last year, I introduced a new tool called the Seed-Group Loss Table (Part 1 and Part 2). In Part 2, I created a linear regression model for each of the top four seed-groups to predict their F4 and E8 potential. I posted the formulas if you wanted to try them for your 2019 bracket, but I did not cover them as official models for the 2019 predictions. I WISH I WOULD HAVE! Using the linear regression formulas and the L% and N/L% for each seed-group, the SGLT predicted for
- 1-seeds: 0.227 for the F4 (either zero or one) and 2.433 for the E8 (either two or three).
- 2-seeds: 0.864 for the F4 (either zero or one) and 1.786 for the E8 (either one or two).
- 3-seeds: 0.9305 for the F4 (either zero or one) and 1.619 for the E8 (either one or two).
- 4-seeds: 0.3252 for the F4 (either zero or one) and 0.573 for the E8 (either zero or one).
If I had to make a prediction in November about this blog for B.C.W., it looks highly likely that I will produce one total article and hope I can be both precise and concise with the all of the details. As for the yearly schedule, it is up in the air. For the last two years, I published articles on Monday morning right at midnight. It looks like Saturday or Sunday may be the best bet, and it may be monthly articles instead of bi-weekly. In past season-opening articles, I would lay out a schedule with likely targets for publish dates. I don't think I'm going that route this year because it may result in over-promising and under-delivering on my part. The only guidance that I can give at the moment is I will prioritize QC Analysis articles above all else, followed by articles on new bracket-picking models or improvements to existing bracket-picking models, and lastly any articles with opinion/feedback/criticism and/or history-driven articles. Believe me, I would love to give my two cents' worth on the stupidity of the expansion of power conference schedules, and if you don't know what I am talking about, just look at how many ACC teams are playing conference games in the first week of November. What a joke!!! To finalize the issue of article scheduling, I will more than likely improvise the schedule this season and have a better understanding of my schedule for the following season (2020-21). I do know the contents of the next article because I wrote it before I wrote this one. I just don't know what date I'm going to set for the auto-publish. Anyways, I'm looking forward to another year of trying to accomplish a lifelong dream, and I hope you will join me for the ride.
Mar 20, 2019
Return and Improve Model - 2019 Edition
Mar 19, 2019
2019 Quality Curve Analysis - Final Edition
From the looks of it, there's a lot to talk about, so let's see what we can learn.
Mar 11, 2019
Bracket Profiles
As the heading would suggest, this data set looks at the tournament field by seed-group for a given year and tallies the teams in each seed-group that went to the previous year's tournament. For reference, this count only looks at the Field of 64 (teams that fail to advance out of the First Four games are not counted as tournament returners).
Since I've only been looking at this data for about two weeks, I haven't had the time to fool-proof my excel formulas to precisely see how significant returners are to seed-group performance. Instead, I'll focus only on the areas that I think bracket-pickers would want to know: The seed-expectations for 1- and 2-seeds, and the R64 upset-potential of 5- and 6-seeds.
- Since the data doesn't count First Four losers (First Four began in the 2011 tournament), let's start there. I find it very ironic that the addition of four teams into the field hasn't improved the totals of returning teams. Only two years from 2011 to 2018 have produced more than 33 returning teams. From 2002-2010, six of the nine years in that span produced more than 33 returning teams. It could be a contributing factor to the relatively sanity of those nine years when compared to the eight years of the First Four's existence.
- The pattern that immediately catches my eye is the 1-seed returners. There are only three years (2014, 2010, and 2006) in which one of the 1-seeds didn't go to previous year's tournament. Those three years produced some of the wildest tournaments we have seen. When taken into the context of the groupings in Point #1, 2006 and 2010 produced the highest M-o-M ratings in the 2002-2010 era, and 2014 produced the highest M-o-M rating of all tournaments in this chart. 2006 failed to produce a 1-seed in the F4 (the other two only produced one each). Think logically about this: If a team can fail to qualify for a tournament in one year and then achieve a 1-seed in the following year, the specific team probably did so against a weak field in the latter year (which is indicative of a crazy tournament). All three of those 1-seeds that failed to reach the tournament in the previous year (UVA, UK, and MEM) bowed out before reaching the F4 (seed expectations for a 1-seed), and two of those teams had the same head coach.
- As for 2-seeds, there seems to be only two outliers: 2002 (one returner) and 2006 (two returners). Ironically, both years produced a 2-seed in the F4, which happened to be returners. Both years also saw a 2-seed fail to reach the S16, which happened to be non-returners. In a even stranger twist, 2002 with its sole returner produced three 2-seeds in the E8 (which matches seed expectations for 2-seeds). Looking at their first-round counter-parts, the only three years (2012, 2013, and 2016) in which a 15-seed knocked out a 2-seed had zero returners for the 15-seed group. This makes 15-seeds 3-for-6 when none of the teams went to the previous dance with all other years producing at least one 15-seed returner.
- Tournaments typically produce anywhere from two to four returners in the 5-seed group. Only three times (2015, 2009, and 2007) has a tournament had less than two returners, and contrary to logical assumptions, two of those three years (2007 and 2015) produced four 5-seeds advancing past their historical nemesis (12-seeds).
- 6-seeds typically have two or three returners in their group. Of the four years (2015, 2013, 2007, and 2004) when 6-seeds produced one returner, they won at least two of their match-ups against their 11-seed counterparts. When the tournament reached historical levels of sanity in 2007, the only two R64 upsets in that year happened to 6-seeds.
The data set below presents a tournament's profile by counting the number of teams from a previous tournament's round if they qualified for the current tournament. For example, the 2018 tournament featured the 2017 National Champion (NC), the 2017 National Runner-up, only two F4 teams (which happened to be the NC and NR of 2017), six E8 teams from 2017 (again, the only two missing are the two F4 teams that failed to return), and eleven of the S16 from 2017 (the two failed-returners of the F4 and three teams that made the S16 but lost their next game).
While I haven't had time to dig deeper into the patterns of this table, I would definitely claim this data set to be the oddest of the two. For example, National Champions have returned to the tournament in the following year every time except on four occasions. In two of those four "non-returning NC" years, at least three 1-seeds made the F4. In the one year in which the previous NC and the previous NR did not return, it was the same year in which all four 1-seeds made the F4. I'm certain it is 100% coincidence. While I doubt this data set will turn into any useful bracket-picking model, I think it may be useful in formulating bracket-picking strategies to win your office pool instead of "How to get every pick right" like this blog aspires to do.
Before you look at the chart below, one detail must be noted. Even though AZST and NCCU have play-in games in 2019, neither team won their play-in game in 2018. Since this method only looks at the field of 64 for year-to-year consistency, it does not matter what they do in their 2019 play-in games as neither was in the field of 64 in 2018.
I will state again for the record that I have no idea what this means or how to use this information as I have yet to fully back-test this method. For comparison purposes, the 2019 tournament profile looks a lot like 2004, except it has two less returning teams and one extra returning E8 and S16 team.
I hope you enjoyed this venture into tournament profiling. Even though we find many similarities between tournaments and the participating teams, it does seem like no two years are the same. These two profile sets seem to confirm that no two tournaments are the same since no two profiles are the same. As always, thanks for reading my work, and next Sunday starts Bracket Crunch Week. I just hope I'm ready and prepared for 2019!!!