Nov 6, 2017

Welcome to the 2017-18 College Basketball Season

Just as the title says, Welcome to a new season of college basketball. I'm excited to bring back Project: Perfect Bracket for a third season, and as always, the goal is to do what no one has done before: Pick a "PERFECT" 63-game NCAA tournament bracket.

Before I grade myself on last season's performace, I want to give you a pretty firm outlook of what to expect on PPB for the whole season. In the first year (2015-16), I was pretty disorganized and pretty erratic in scheduling that I literally over-worked myself to the point of not producing a bracket (other than my gut picks as soon as the bracket is revealed). I definitely didn't make this mistake in the 2016-17 season, but I did make new ones (all the more from which to learn). I doubt I have it completely figured out for the 2017-18 either, but if I don't make the same planning/preparation mistakes of the last two years, then it should have my best year yet. So, what can you expect this season?


  • Precise schedule: From November to Crunch Time (which starts in mid- to late-February), I will post an article every other Monday counting from this one. Unfortunately, the calendar year doesn't sync up with the basketball year, but Monday looks like the best way to do that.
    • Mar 11 is Selection Sunday, so everything has to revolve around this date.
    • This means that Jan 1, Jan 29, and Feb 26 will be Quality Curve analysis articles and Mar 12 will be the Seed Curve analysis.
    • Crunch Time will again have a variable schedule, but to avoid the mistakes of last year, I will have my ideas fleshed out and finalized before February and steer clear of investing precious time and resources in last-minute ideas that never make it to the blog.
    • The blog memo "To My Readers" will be routinely updated to tease upcoming articles as well as announce any additions/changes to PPB.
  • New Content
    • New Data: With the change in KenPom's methodology for the start of the 2016-17 season, I wanted to incorporate other ratings/data sets into my analysis to see if they were sending the right signals. Unfortunately, I wasn't able to back-test any of the new data sets nor was I able to correctly convert them from .TXT files to .XLS files, so they did not make an appearance in 2016-17. This WILL NOT HAPPEN in 2017-18!
    • New Methods: For the most part, PPB has taken a macro-view approach to tournament analysis, meaning the analysis has looked at the tournament as a whole rather than looked at it through the individual teams. These micro-view methods will be a new addition for 2017-18 while continuing the successes of our macro-view methods. Since you are avid readers of my blog, I am sure that you have already predicted some of these micro-view tools, such as trend analysis and the return and improve model. While I am testing (and will be including) more micro-view models, I will not reveal them just yet because writers love teasers more than they do spoilers. I will confirm some things:
      • I (very likely) will not be doing team write-ups like the "Different Kind of Tournament Resume" last year. This was an attempt to summarize lots of quantitative information into a qualitative format. A few individuals (more likely fans of their team than readers of my blog) in public comments and in private emails were miffed that I had anything bad to say about their team. I was equally critical of all teams, including my own, and unlike these fans, I had facts and stats to back up my criticisms. Let me be clear: I am here to pick a perfect bracket, not pick on your team or console your feelings. If you want a second opinion on your bracket picks, I am here to provide the most objective one I can, but if you want to argue why I didn't pick your favorite team, then don't waste either of our time!
      • I (100%) will not be doing the seed-by-seed upset or Final Four Contender/Pretender models. One, I have not kept track of these methods since Bracket Science ended. Two, I feel these to be Peter Tiernan's proprietary work and I am not comfortable taking credit for someone else's work.
      • I plan on (50-50) doing the 2018 Stat Sheet myself this year. Last year, I crossed it off my to-do list for a variety of good reasons, but since I am wanting to include micro-view analysis this year, the 2018 Stat Sheet seems like a very good starting point. I really want to do a "How-to-use-this-tool" article for the Stat Sheet, similar to the articles I have done for the Aggregation Model and the Quality Curve Model, but only time will tell.
    • New People: This is the one that I cannot control, but I do want it to happen. I've never really wanted PPB to be a blog where I make predictions and my readers fill out their brackets according to my analysis. When Bracket Science ended, it essentially was Pete's work on display with a lot of us bracket-nerds bouncing ideas off of him in the comment's section. I truly enjoyed this feature and it is what I wanted to continue. If you have an idea, a stats/data set, a bracket-picking analysis/method, or anything you want to investigate or make into a project with the purpose of bracket-picking, feel free to use my site to find others to help with your project. If it is well-researched and well-written, I'll even allow you to post your findings in a PPB article. I do not make a single penny from this site, and as I said before, I don't feel comfortable taking credit for the work of others, so I will make sure you get the credit for it. I do this for the love of the game and achieving the pinnacle. If you feel this way, you are more than welcome to join.
With the expectations for 2017-18 PPB out of the way, let's see how my 2017 predictions fared.

Grading the Predictions

Key Note: The only predictions that I am grading are those that were "continual" and "definitive."
  • CONTINUAL -- If I made a prediction in the January QC Analysis based on January data, I only grade it if it was carried through to the February, March, and Final Analyses. 
  • DEFINITIVE -- Any prediction that I made in a QC Analysis and qualified it with "we'll have to see what the seed-curve shows us" will not be subject to a grade. It literally has to be something like "..... will happen" or "I would go with .......".
  1. "There is strength at the top, and parity among that strength." If there is one thing I said all year -- continually and definitively -- it was this statement, and it couldn't have been more correct. I made multiple attempts to qualify this statement, so let's look at them as well.
    • Fewer R64 upsets, More R32 and S16 upsets (Mar QC): Nailed it! If you look at the upset table in the 2017 Final QC, it shows the 15-year average (2002-16) for each round. For those three rounds, the averages were 5.0, 2.8, and 1.0, respectively. In the 2017 tourney, the upsets-by-round count was 4 in R64, 4 in R32, and 2 in S16.
    • The quality of teams are tiered, where each tier will beat all lower-tiered teams and teams within their tier have an R-P-S relationship. (Crystal Ball and Final QC): Failed it! Basically, I tried to get cute with the "strength at the top and parity amongst" concept and ended up over-exaggerating what was otherwise a simple and practical concept. Examples of where this went wrong: SCAR over DUKE, BAY and FLA, XAV over FLST and ARI, and USC over SMU. Of all the Tier 1 vs Tier 1 match-ups, only lower-ranked/lower-seeded WISC and MICH pulled off upsets over their higher-seeded/higher-ranked opponents in NOVA and LOU. Lower-ranked/higher-seeded ARI got a win over their higher-ranked/lower-seeded STMY, which I would have predicted ARI to lose being over-seeded according to efficiency rankings. Lesson learned: Report the data, don't stylize it!
    • The number of upsets for the whole tournament may approach double-digits as 4s, 5s, 6s, and maybe 7s topple 1s, 2s and 3s in R32 and S16 (Mar QC): Mostly nailed it! 2017 did more than approach double-digits, it hit double-digits exactly with 10 total upsets, which was one less than 2016's total. On the wrong side, not a single 1-seed was toppled by a 4- or 5-seed, even though three 4-seeds had a chance (no 5's made it to S16). A 1-seed was toppled by an 8-seed in the R32, but I only said 4s thru 7s (and in my defense, this prediction was made before the actual bracket and the seed-curve were revealed). Likewise, only one 6-seed made it to the R32 and lost to a 3-seed. On the right side, two 7-seeds toppled two 2-seeds, and one of these 7-seeds went on to topple a 3-seed in the S16 round.
    • [Unlike] 2008 and 2015 where there was dominant strength at the very top, so I wouldn't rush to put three or four 1-seeds into the Final Four. Even though the top three teams in the KenPom rankings received 1-seeds, two or less of these teams in the Final Four is very safe this year (Final QC): Nailed it! Ok, I admit it. This was a gimme! Since there was no dominance at the top like 2008 and 2015 when four and three 1-seeds went to the F4, respectively, the only other options for the F4 are two, one or zero 1-seeds. By collectively picking all of the other options, I was bound to be right. A more-ballsy prediction would have been to pick one of those three options, say exactly two 1-seeds. All things being considered, GONZ and UNC were really good in 2017, but the obstacles in both of their paths (WVU in the West and UK/UCLA in the South) made it appear very unlikely that both would get there (and yes, both of those games went down to the wire, so you can see that it was very risky to make this very narrow prediction).
  2. If I had to pick three [years] and make a composite, here are the three that I would choose: 2005, 2006, and 2014. (Final QC): OMG! Nailed it! I honestly know for a fact that I could not have chosen three better years to make a composite, and as I said in the Final QC article, I did it simply because all three seed curves featured the bowing action at the front of the curve. The only problem with this composite curve is that I didn't take the analysis far enough (see below).
    • From this composite, I made an average AM for the R32, S16, and E8, and I said "189 - 74 -30 look like healthy targets for those three rounds." (Final QC): Mostly failed it! The AM for 2017 for R32, S16 and E8 was 167, 65, and 30. While this nailed the E8 AV on the dot, the failure comes mostly from not scaling it to the apparent strength of the top teams. We know that 2006 and 2014 were rather weak years in terms of team quality, and the differential tables in the Mar QC article showed 2017 outpaced 2005 at all intervals (in fact, it outpaced all years from the 9-26 ranks). I should have calculated a correction factor to adjust the AVG AM of those three years (189 and 74) downward in order to reflect the strength of the teams in 2017. A simple correction factor of 10% would have produced a prediction of 170-67-30, which would have been a lot closer to 167-65-30 than my actual prediction. However, I'm not exactly sure how I would have arrived at the correction factor to account for the relative strength of 2017 over those three years.
    • Another thing all three of these years have in common: All three dropped a 1-seed and two 2-seeds before the S16. (Final QC): Dumb luck! Yes, that was a typo in the original article (which I have left there for all to see), so I got extremely lucky on that one. While this definitely happened in 2014, it was not the case in 2005 or 2006. Fortunately, the mistake was not in my data sets, which I went back and checked. The mistake was all in my mental history. The crazy thing is that when I typed the line I went through my head for each year and named the one 1-seed and the two 2-seeds that were upset without even looking at a bracket to confirm them. If I had said "before the E8," then I could have attributed it to my expert analytical skills, but instead, brain-farting gets the credit on this one. FOR THE OFFICIAL RECORD, the only tournament years that have lost exactly one 1-seed and two 2-seeds before the S16 are 2017, 2015, 2014, and 1986.
    • In all four [2006] E8 match-ups of 1v2, 1v3, 1v11, and 2v4, the lower seed won all four games. The issue I have with 2006 is the same issue I have with 2014: the right half of the curve lingers in the higher levels whereas 2017 does not. (Final QC): Incomplete! While this was more of a statement of fact rather than a prediction, if I had analyzed this further, this prediction would have been near-golden. Of the three years in the composite, 2006 was the only one for which I listed the E8 seed-pairs. 2017's E8 seed pairs were as follow: 1v2, 1v3, 1v11, and 4v7, and the lower seed won half of these (not all, which we could have attributed to the strength of 2017 over 2006). Imagine that, the only curve (2006) for which I list the E8 seed pairs ends up matching 7 out of 8 E8 seeds with 2017. To be honest though, 2005 and 2014 were really good choices too. 2014 had E8 seed pairs of 1v11, 4v7, 1v2, and 8v2 (three correct pairs and 6 out of 8 seeds correct) and three of the four lower-seeds won (more upsets because 2014 was weaker than 2017). 2005 had 1v3, 4v7, 1v6, and 5v2 (two correct pairs and 6 out of 8 correct seeds) and only one of the four lower-seeds won. If only I had dug deeper into my findings........
    • It is worth noting that 2005 did see a 12-seed in the Sweet 16, so maybe the same will hold for 2017's 10-seed. (Final QC): Misleading! Again, another instance of a statement of fact rather than a definitive prediction. The problem with this statement comes from the fact that I over-emphasized the strength of the 10-seeds throughout the SC analysis, but this is the closest I ever come to making anything resembling a prediction for the 10-seeds. In fact, if you look back at the analysis of the 7-seed group, I called the match-ups of the 7v10 games as two coin-flips and two no-brainers. The only 10-seed to win their 7v10 match-up was one of the two no-brainer picks (WICH over DAY). If I had focused on this comparison and completely ignored the anomaly in the Seed Curve, this would have made for a solid prediction instead of misleading conjecture.
  3. As for my upset totals, I think 4-3-2-0-0-0 looks pretty reasonable. (Final QC): Mostly nailed it! This prediction is probably the one I am most proud of and the one I am most upset over. Imagine if you knew the upsets were coming and the tell-tale signs to pick the upset candidates and victims, so all you needed was the road-map. You come across Project Perfect Bracket where the author says the road-map is 4-3-2-0-0-0. Jackpot! Perfect bracket here I come! After the Saturday/Sunday games, you hit the wall because you discover that 4-4-2-0-0-0 may end up being the actual road-map. 62 games called perfectly by the road map only to have that one additional upset in the second round shatter your perfect bracket. The dream of perfection blown to shreds because the road-map screws up one pick! Don't get me wrong: I am proud of this prediction because I was closer to predicting the tournament than anyone else with this 4-3-2-0-0-0 road map. When all of the yearly averages suggest 5/6 - 3 - 1 - 0/1 - 0 - 0 and when all of the televised pundits are suggesting 6+ - 3+ - 1 - 1 - 0 - 0, it felt really good to depart from them all with a 4-3-2-0-0-0 prediction knowing that 2017's team quality suggests more normalcy than most years. However, this prediction literally cost you a pick, and not only that, it cost you a S16 team, which is worth 2 points in most bracket contests, and that's what upsets me (see curve-fitting below). What makes it worse is when I followed the prediction up with this: "If you are ballsy, either another upset in the R32 or one in the E8, but I probably would not do the latter unless a 100% certain opportunity presented itself." You tell yourself the potential for a 4-4-2-0-0-0 is definitely there, but don't do it. That's not being ballsy, that's called 'BEING RIGHT!' When you realize that you are one upset off in the second round, it hurts! Knowing everyone else was predicting the wrong road-map, I wanted to get this one exactly right, and I didn't. It pains me greatly to see "mostly nailed it" instead of "nailed it".
Learning the Lessons

Typically, you learn things first, get tested on your knowledge/recollection of those things learned, then get graded, so maybe this section of the article is out of order since I have already done the grading. Either way, let's document what we've learned from last year.
  1. The mind of the Selection Committee still matters. I started off the "Welcome to the 2016-17 Season" article the same way, and honestly, I could have done an entire article on how bracket picking (mainly upset-picking) can come down to the decisions (right or wrong) of the Selection Committee. This more than held true for the 2017 tournament.
    • In 2016, it seemed as if the Selection Committee over-weighted conference affiliation when giving out 5-12 seeds to teams (for full details on how I hypothesized the selection process, check the link above). As a result, certain teams appeared to be over-seeded just by the eye test, let alone efficiency rankings (i.e. - #6 HALL, #7 ORST, and #8 TXTC just to name a few). Even more, these seemingly over-seeded teams failed to meet seed expectations. It seems as if that was the pattern to this year's selection process as well. (seeds by conferences with conferences listed in order of Conference RPI).
      • ACC: 1,2,2,3,5,5,8,9,11
      • B12: 1,3,4,5,10,11
      • BEC: 1,4,6,9,10,11,11
      • B10: 4,5,6,7,8,8,9
      • SEC: 2,4,7,8,9
      • P12: 2,3,3,11
      • AAC: 6,6
      • A10: 7,10,11
      • WCC: 1,7
      • As an additional note, the MVC was the 12th best conference according to Conference RPI rankings (behind the MWC and the CAA), and this may explain why WICH received a 10-seed when they clearly deserved higher and DAY (the A10 regular season winner) received a 7-seed.
    • For the 2017 tournament, there seemed to be an added weighting to winning major conference tournaments. In the post-unveil interview with the Committee Chairman, he even hinted at the notion when he was describing the "scrubbing process". This added weighting to conference tournament winners appears to be true when looking at the seed of teams like #2 DUKE, #7 MICH, #5 IAST, and #2 ARI. While it may or may not have had any noticeable impact on NCAA tournament performance of those teams, it definitely raised a lot of questions in seeding discrepancy.
    • The main reason for understanding the Committee's selection process is to provide us with a baseline. If we can assess team quality better than the committee, then the discrepancies between the two appraisals will point us to sleeper/upset picks. Another value in having a baseline comes from discerning when the committee deviates from this process. This may have been the holy grail for the 2017 tournament. 
      • For example, WISC seemed very under-seeded as an 8-seed (which translates to a 29-32 ranking). Simply looking at its resume, WISC finished in a two-way tie with MARY for 2nd (behind PUR) in the B10 (4th strongest conf), finished runner-up in the conference tournament to MICH, had the H2H win at home vs MARY (a 5-seed), and had the regular season H&A sweep of 4th place MINN (5-seed), yet somehow WISC gets an 8-seed. Either those two were over-seeded (which may explain their opening-game losses) or WISC was under-seeded (which may explain their S16 appearance). 
      • Another instance of where it appears the Committee deviated from the baseline involves the A10. It seems as if the committee interpreted the A10 auto-bid to URI as a huge problem. URI did not have an impressive resume in a weak-bubble year and most likely would not have received an at-large berth without winning the A10 conference tournament (a pet peeve of mine for many years). The first thing they did was put URI on the same seed-line as the play-in game (and I can agree with this logic). The next thing they did was squeeze in a lot of other teams ahead of them. They needed VCU above them because VCU had a better resume, so VCU got a 10-seed. MARQ was BEC and OKST was B12 and those conferences were better, so they had to throw them in as well at the 10-line. 
      • From this point, the real anomalies start to show themselves, such as VAND and HALL at the 9-seed. While those teams may have had better resumes than URI, neither of them had better resumes than nor was a better team than OKST. Likewise, I would be hard-pressed to say either HALL or MARQ (two BEC opponents) was one seed line ahead of the other, so giving HALL a 9-seed and MARQ a 10-seed seemed sketchy. Furthermore, I don't think a 15-loss VAND with 3-wins over FLA deserved to be solidly into the tournament, especially not ahead of #10 WICH and #10 OKST. If the committee is going to make that exception for VAND, then WAKE should have been solidly into the tournament as well (instead of the play-in game) and maybe GATC should have been at least into the play-in game with wins over #1 UNC, #3 FLST, #5 DAME, and #10 VCU (and considering GATC made it to the NIT finals, they may have taken the NCAA-snub personal). Yes, hind-sight is 20-20, and I probably should do this analysis just before the tournament starts. Nevertheless, when the Committee is making deviations from and exceptions for their own process, then it should be a signal to us bracket-pickers that upsets and sleepers are there.
  2. The failures of Curve-fitting. The second lesson learned from the 2017 NCAA tournament is a rather difficult one for me, as it involves a process that I have been testing for the last three years. The difficulty arises from it being so hard to throw away my work on anything that I have invested that much time into it. Nonetheless, it seems time to put curve-fitting on the back burner (or maybe even throw it out altogether). I've never posted any article on the method (and I'm sure I've only ever mentioned it in the comments section as the force-fit model) because I never really trusted it. The 3rd straight year of disappointment in testing with it and this year being the most disappointing, I have come to the conclusion that it is not reliable enough to use. So, what is curve-fitting? Simply put, it is a bracket-picking method that "forces" a pre-determined quantity of upsets for each round and uses a variety of strategies to determine which games are "fit" for upsets. Here's why I don't trust it.
    • First, you have to be right.....about the number of upsets. As I stated above in my heart-breaking prediction of a 4-3-2-0-0-0 road-map, one wrong count and the dream of a perfect bracket is shattered. Not only that, one wrong count in any round can actually cost you more picks in later rounds. For example, if you followed this road-map for your 2017 picks and you also applied the dumb-luck prediction of one 1-seed and two 2-seeds falling before the S16, there are your three upsets for the second round. This means you will miss the XAV over FLST upset because there was 4 upsets in the 2nd round (R32) of the 2017 tournament, and XAV went on to pull another upset in the 3rd round (S16) over ARI. Thus, by being wrong about the number of forced upsets, this cost me one pick in the round that I was actually wrong about and cost me another pick in the next round (hypothetically this would have had #3 FLST vs #2 ARI).
    • Second, you have to be right.....about the games. Suppose you get the actual number of upsets (the road-map) correct. That's only half the battle because you have to get the actual upset games right too. Hypothetically, let's say the road-map is 1-0-0-0-0-0 (clearly a hypothetical). All you have to do is pick the one upset and you have a perfect bracket. Suppose you pick #11 Who Cares over #6 Who Knows, yet in actuality, the upset was #12 Whatever Team over #5 Other Team. Although you got the road-map 100% correct, by picking the wrong upset game, you end up missing two picks in your bracket: you picked #11 and they actually lost to #6 and you should have picked #12 when you actually picked #5. As you can see, the inherent frustration of curve-fitting means you have to be 100% right about the road-map and 100% about the upset-games. Yes, if you get both correct, you will end up with a perfectly predicted bracket, but by this point, you might as well be predicting each individual game because the method isn't doing the work for you. By starting with the road-map first and fitting your bracket to this road-map, you are compounding errors, not reducing them, and to me, this is a counter-productive approach.
  3. Advanced Metrics: Methods behind the Madness. If you have made it this far in reading this article, you truly are a bracket warrior plus you have my utmost appreciation. Likewise, if you read all my articles this far, then you will remember at the beginning of last season when I talked about the change in calculations for advanced metrics, specifically the KenPom ratings. For the 2016-17 season, Ken Pomeroy changed his methodology from the multiplicative structure of Pythagorean Win Percentage (PWP) approach to the more modern additive structure of the Adjusted Efficiency Margin (AEM) approach. I am far more familiar with the inner workings and calculations of the PWP having read the works of Bill James and Dean Oliver, thus I'm more familiar with how to apply it as a bracket-picking tool. With AEM, I am far less familiar with its inner workings and calculations (nor do I expect Ken Pomeroy to reveal such information), and as a result, I am far more uncertain as to how it could or should be applied as a bracket-picking tool. For the 2016-17 season, I used the AEM rankings to create the ever-useful Quality Curve and Seed Curve. With some very grandiose assumptions, I was able to make comparisons to previous years since Ken Pomeroy recalculated all of his rankings from 2002-2016 with his new AEM methodology. For the time being, the advanced metrics will continue to be an integral part of the QC and SC. As time passes and we move further away from PWP and into AEM, I believe the historical comparisons between the two methodologies (one multiplicative and one additive) will become less reliable. In other words, I believe the curves (QC and SC) produced by the PWP may predict something entirely different than the the curves produced by the AEM, even though the two curves produced may look exactly the same (same peaks, same troughs, same bowing action across the same seed lines, etc). If I knew the AEM methodology better, I would feel more comfortable in applying it to new and existing bracket-picking methods. Until then, I think the safest course is to concentrate solely on the QC and SC with advanced metrics, and when the time is appropriate (most likely years from now), I will apply it to other areas.
CONCLUSION

As always, thank you for reading this article. This Friday, Nov 10 marks the start day for college basketball games, and if you liked what you read here, be sure to mark your calendar for Nov 20, when my next article appears: Unorthodox Bracket Picking Methods.

No comments:

Post a Comment