Feb 22, 2017

Trends: The Least-Quantified Metric

Since I left you the entire season in suspense over the "most talked about....least quantified" metric, I thought I would end the suspense right away by putting it in the title. This way, we can focus entirely on the idea instead of the mystery. If you have ever watched a basketball (or any sporting) game, or even the unveiling of the brackets, I guarantee that you have heard a specific team's current trend "qualified" in no uncertain terms: "This team is on a 6-game winning streak," This team seems to be in shooting slump," or "This team's average points per game is X, but over their last 4 games, it is higher." Although these statements contain numerical evidence suggestive of a trend in place for a specific team, I would say these qualitative statements suggest nothing more than plain factual evidence. Yes, that team has won 6 straight games. Yes, that team is missing a lot of open shots. Yes, that team's points per game is higher now than it was 4 games ago. To see why these statements are deceptive when it comes to the concept of trends, let's jump right in.



Trending Now: Definitions

Since I have probably created much confusion with what constitutes a trend and what does not, let's start with the obvious: How to define a trend. A trend must have these two qualities:
  1. A Direction - Up, Down, Sideways. As long a team isn't alternating between their best game of the year followed by their worst game of the year, a trend should be noticeable.
  2. An Indication of the Change in Direction - A quantifiable point or a visual pattern that the change in direction is about to happen. 
With a solid definition for trend, let's now look at why the general qualitative statements are deceptive when it comes to the concept of trends.
"This team is on a 6-game winning streak."
  1. Direction (PASS)- I would say in the up direction (or more casually, in the right direction).
  2. Indication of Change (FAIL) - Do all winning streaks stop at 6? Do winning streaks of 6 imply additional wins (i.e. - 2 more wins, I'd love to be able to pick a team to go to the Sweet Sixteen simply because I know their 6-game winning streak guarantees two more wins). There is no concrete indicator of when this trend will end.
"This team seems to be in a shooting slump."
  1. Direction (PASS) - Shooting % is in a downward direction.
  2. Indication of Change (FAIL) - Again, this statement gives no clear timetable of when (or even if) they will return to their shooting averages. DAME went through a shooting slump earlier in the year, UVA is currently in one, but stating that a team is a slump does not indicate how long the slump will continue.
You could do the exact same with the third statement, but I didn't want to be redundant. As you can see, the second quality is what separates a true trend from a general statement suggestive of a trend.

Trending Now: Purpose

Trend analysis is simply finding patterns in past data that possibly foretell future occurrences. With a direction clearly identified and a metric that indicates the trend is likely to change, trend analysis can be a very valuable tool in identifying teams that could potentially make runs into the tournament. In the 2016 tournament, Villanova went on a historical offensive efficiency streak even though at the onset of the tournament, they ranked 5th overall in team efficiency and 11th overall in offensive efficiency. Wouldn't it have been nice to be able to predict this offensive efficiency streak before the streak even happened? That is what I hope to accomplish with this (unpolished) venture into trend analysis, and in the coming years, I hope to have trend analysis down to a science. In this article, I will look at three clear examples of trends, explaining the two trend factors for these examples and using these three examples for predictive purposes.

Trending Now: Identifying Direction

Since the first quality of trends -- identifying direction -- is discernible from either general statements or statistical patterns, this should be the starting point of our analysis. The three examples are DUKE, IND, and NOVA. If we begin our trend analysis with the rudimentary approach, we could make generalized qualitative statements about each of these three teams that would suggest a direction their seasons have went.
  • DUKE played well at the start of the season, then they tripped up (pun intended) around the time of the ELON game, and just recently they began playing well again.
  • IND also started the season strong, but they fell apart just before the start of conference play and have been abysmal ever since.
  • NOVA has played well throughout with a few bright spots and a few slip-ups along the way.
If you have followed college basketball through the course of the season, you could have came up with the exact same statements. We want to take it a step further and we want to see data that visually demonstrates these qualitative statements on direction.

Here is the graph on DUKE using daily KenPom ratings throughout the season.

Here is the same graph for IND.


Finally, here is the same graph for NOVA.

I kept an approximately similar range for the y-axis scale for all three teams to keep them visually comparative. I think the charts speak for themselves, and considering the purpose at hand, they clearly demonstrate the concept of direction for trend analysis.

Trending Now: Indicators of Change

Looking at the charts above, each (especially the first two) show points in time when the trend went from one direction to another. However, looking at the chart on those specific days when the trend changed, it would be impossible for the most astute trend analyst to claim "this is a peak and a downtrend is coming" or "this is a trough and an uptrend is coming." To have indicators of directional change, we are going to make them ourselves. Luckily, I've been doing stock market data analysis (as a hobby) since 1999 and if there is one industry that knows a thing or two about trend analysis, it would be financial market analysis. I'm actually going to borrow a couple of tools from financial market analysis and employ them in trend analysis for college basketball.
  • Moving averages - a tool in trend analysis that smooths out fluctuations in volatile (or continually changing) data points. Essentially, a moving average trades out the oldest data point for the most recent data point, and the difference in value between the two data points indicates the direction of the trend over the duration in time between the oldest and newest data point.*
  • Bollinger Bands - a tool in trend analysis created by financial market analyst John Bollinger that charts out standard deviation bands alongside moving averages, which visually creates a trading channel for the financial asset.**
With these two tools, let's create a few charts and see how they work. On these charts, blue will be our original KenPom data, magenta will be our moving average, and the two yellow bands will be our upper and lower bounds for standard deviation. I'll start again with the chart for DUKE.

Now, let's look at the chart for IND.

Finally, we have the chart for NOVA.


Before I start analyzing the three charts, I want to go over two bits of details on methodology.
  • In each of these three charts, I used a 21-day moving average. I chose 21 days because the data being used to predict the bracket will come out approximately 21 days before the national championship is played. I assumed seeing the data in 21-day intervals would be the best approximation for finding patterns in the data that would better predict what could happen in the next 21 days. I do not definitively know if 21 days is the right value, it just seemed logical.
  • In approximating the upper and lower (Bollinger) bands, I also used 21 days worth of values to calculate the standard deviation (for consistency with the moving average). Both bands -- upper and lower -- are exactly +2 and -2 standard deviations from the 21-day moving average.
So what are the charts telling us? Although it is not precise, as the original data approaches the standard deviation bands, shortly thereafter the direction of the original data changes. It usually changes within one to seven days after contacting or breaching the standard deviation band. If it contacts the upper band, then a downward trend is about to begin, and if it contacts the lower band, then an upward trend is about to begin.

Trending Now: Practical Application

1. The most important advice I can give about using trend analysis: Use good data. For this exercise, I used daily KenPom ratings. As I have said all year, I am unfamiliar with his new methodology on calculating his ratings, but I know he still uses the same efficiency concepts pioneered by Dean Oliver. However, I do know that his ratings early in the season used some kind of estimator until he could get enough regular season data to normalize his ratings. I'm not sure if he made estimates based on the previous year's final statistics or if he used data from pre-season games and dropped this data out of the ratings system when enough regular season games had been played. This could explain why the DUKE trendline started the season in the 30s, normalized in the low 20s, and then began an new uptrend, which, at its current pace, would not get DUKE back into the 30s. It could explain why a few of the charts show bottle-necking patterns, where the two yellow lines accelerate towards one another creating a narrow cap on a wide body (like a coke bottle).

2. A change (or soon-to-happen change) in trend does not necessarily translate to a loss on the basketball court. Theoretically, a team can perform below their current trend's level and still win the game because they are statistically superior than their opponent. If an indicator suggests that the trend is about to change, it simply means their rating is going to be lower (due to bad play) than before. It does not imply that they will lose their next game.

3. While I wouldn't use trend analysis to make every single pick in my bracket, it should have some uses in picking certain games. If I see an 8-9 or 7-10 where one team is definitely trending up with no signs of trend-change and the other team is definitively trending down with no signs of trend-change, I would pick the up-trending team without hesitation. I know, for now, this strategy has no use to most of my readers because I doubt most of them have been collecting daily ratings data like I have been. I hope I can post a page on the blog with charts of each tournament team's trend, but time constraints may or may not allow for it. I will let you know via the "To My Readers" section if it will happen.

4. In most years, there always seems to be a surprise team that makes a deep run that very few see coming. In 2016, it was SYR. In 2015, it was MIST. In 2014, it was CONN and UK. We also can't forget 2013 WICH, 2011 VCU, 2010 and 2011 BUT and the surprise deep runs they made. Hopefully trend analysis will see it coming. If I was to speculate what the trend chart of a surprise team would look like, I would assume something like this:
  • Moving-average (magenta) and standard deviation (yellow) curves changing from flat to upward-sloping.
  • A ratings line (dark blue) making consecutive higher top-points and consecutive higher low-points (each subsequent high point is higher than the previous and each subsequent low point is higher than the previous).
  • A ratings line (dark blue) mostly between the moving average (magenta) and the lower-bound standard deviation (yellow) curve.
Though I feel like the pictures are far more exciting than the text I used to describe them, I hope this article on trend analysis was an eye-opening adventure into a new way of looking at team and match-up analysis for the NCAA tournament. As always, thanks for reading, and Wed Mar 1 will be the March Edition of the Quality Curve Analysis.

10 comments:

  1. Yes this is interesting. I wonder if it would explain the syracuse run last year.

    It could make sense just looking at the schedule.

    They lost their last 3 games of the regular season. Including a loss to PITT in the 1st acc tournament game.

    So at that point maybe SYR is at their low point below the moving average.

    Then they play DAY who is also at their low point. I believe that DAY had injuries to their best player.

    That would explain that win. Next was MID TEN ST. So SYR could beat them without their best game. Now they are trending above their moving average for GONZ. GONZ maybe is trending down at that point so SYR sneaks by (with help from awful call if I remember correctly).

    So now SYR at their peak plays VIR who is possibly at their low point.

    After that idc where SYR is UNC crushes them.

    I dont know if the data backs up any of that. Or if that works on any of the other crazy runs.

    but this is definitely interesting and I appreciate all of the work that you are putting out there.

    ReplyDelete
  2. There's nothing like a loss to refocus a team and there's nothing like a long-winning streak to breed complacency. The peak-and-trough logic that you described is what I hope trend analysis will uncover (or even better, prove). Unfortunately, I didn't start collecting data for trend analysis until this season, so I don't have any way to substantiate the logic. Since trend analysis is in its infancy, I probably won't put a lot of emphasis on it for the 2017 tournament until I've had time to really explore and tinker with it.

    ReplyDelete
    Replies
    1. How much attention do you pay to the individual match-ups?

      I have been trying to see if anything can be gathered when looking at certain stats between the teams.

      What I have been comparing is as follows

      For example last year Purdue vs Ar little rock

      O- kempom offensive efficiency
      D- Kenpom defensive efficiency
      OREB - offensive rebounding percentage
      DReb - defensive rebounding percentage
      OTO - offensive turnovers per possession
      DTO - Defensive turnovers forced per possession

      I have been trying for the last few seasons to compare these numbers to see if there is anything that might help predict the games. I am also wondering if there are any other items I should be looking at.

      I also try to look at injuries and their impact on the teams.

      Maybe tournament experience would help also?

      I as curious to what you or any other people who read this blog thought about this.

      Ar little rock won the game**

      Purdue Ar little rock
      O- 19 D- 34
      D- 11 O- 97
      OREB- 41 DREB-101
      DREB- 2 OREB-241
      OTO - 124 DTO - 28
      DTO - 349 OTO - 27

      From what I have noticed in looking at not only this game but all of the games like this is that the turnovers should be considered the most important of these type stats. To me it makes sense because if a team turns the ball over they are unable to even attempt an offensive rebound or execute their offense. So it does not matter if they rebound or are better per possession if they waste it due to a turnover. And from what I remember purdue was awful at turning the ball over at the end of that game. That was a huge reaason why they lost the game.

      Anyway I am curious what people think of this any hopefully this can help us get closer to that perfect bracket.

      Again I appreciate what you are doing.

      Delete
    2. When it comes to efficiency metrics, the only thing that matters is points per possession. 99% of the time, a basketball game will end one of two ways: both teams have an equal number of possessions or one team (depending on who wins the jumpball and if any tie-ups occur) has one more possession than the other. Note: In rare circumstances, a team can end up with two more possessions in a game than their opponent.

      If the game has an equal number of possessions, then the team with the highest points per possession will win the game (pace will determine the margin of victory). If the game has an unequal number of possessions, then you have to do some funny math to figure out who won, but more often than not, it will also be the team with the better points per possession.

      As for the metrics you cited, keep in mind, all those values are relative (they do not have any meaning on their own). For example, Purdue's O was 19 last year. It does not mean anything. For example, Purdue can have a week off from play and teams 13-18 can have 1 (or maybe 2) bad shooting nights. Those 6 teams will see their O rankings fall, while PUR will rise to 13th in O rankings without playing a single game. Those values simply state "At this very point in time, PUR has 19th efficient O, 11th efficient D, etc.

      Another thing to keep in mind, PUR plays half their season against the B10 while AR-LR played half their season against the Sun Belt. I would be hard-pressed to claim that a possession against the average B10 team is equal to a possession against the average SBC team. That's part of the reason why I'm keeping track of the daily movements of efficiency rankings, so I can see how much of a boost smaller conference teams are getting from playing possessions against inferior teams.

      With all that said, I will try to address your original question about the individual match-ups. One of the easiest things to do when it comes to individual matchups is look at how a team did when they won and how a team did when they lost. If PUR turns the ball over a lot in the losses compared to their wins, then yes, that would be something to look for. However, it is much more difficult than just looking for a team that has a high DTO ranking. You need to know how/why PUR was turning it over. Was it full-court pressure, was it zone defense, was it simply due to the game being on the road, or was it PUR's 2nd game in 3 days and TO's were due to fatigue? There is a lot to look at when it comes to figure out an individual match-up, and most of all, there are two teams in a match-up. You have to do the same analysis for AR-LR that you are doing for PUR. You can't say, "Well, AR-LR seems to have the recipe for beating PUR, I'll just stop there are pencil in AR-LR." You have to look at AR-LR the same way. After all, the game did go to overtime (though it shouldn't have), so PUR has something that countered AR-LR.

      I know that's probably a lot to digest and I probably still didn't answer your question fully, but as you can see, there's a lot going on in individual matchups. I've said on my blog on a few occasions, for 8-9 and 7-10 matchups, look to see who Vegas has a favorite. That will tell you a lot about who should win that game. Vegas is often wrong on point spreads, which is why many casinos only lose money on their sports betting operations, but they have a much higher accuracy when it comes to favorite/underdog.

      Delete
    3. How do you decide who to pick in an individual matchup? Is it based only on vegas?

      What did your bracket look like from last year?

      I understand what you are saying about looking into why a team would turn the ball over. I was hoping that would provide some way to determine possible upsets. AR-LR forces turnovers at a high clip and Purdue turns the ball over at a high clip. Although Purdue was much more efficient on the offensive end.

      Delete
    4. If I think an 8/9 game is a complete toss-up statistically, then I just pick the team that Vegas has favorite. A few times they were straight picks on Vegas, which stunk because I already thought they were statistical ties and my go-to tie-breaker also thought the same thing. Believe it or not, I tend to avoid looking at Vegas until the last-possible minute because seeing a point-spread favorite is the same thing as hearing a sports media analyst claim who they think the winner is. Both introduce bias into my thought process and I try to avoid them (Yes, I watch the Selection Show on mute).

      The only bracket I made last year was "From The Gut". As soon as the pairings are released, I pick all 67 games (including play-in games) from the gut. I spent so much time preparing work for the blog and collaborating with others, I never finished my force-fit model (which I would have used for bracket contests, ergo I didn't enter any). Off the top of my head, my gut-bracket had UK, OKLA, MIST, & KU in Final Four. I picked UK cause UNC, IND & UK in that one region (pick one and advance all the way, everyone else was terrible). I liked 2-seeds last year, they were strongest seed group, got 1 out of 2 right in the Final Four. I think MIST was my Gut Champion. My surprise team was Wisconsin, had them falling to UK (another 4-7 Elite 8, how many of them have we had recently?). As far as my overall view of the tournament, I was pretty accurate (you can reference my old articles), but as for individual picks from the gut, I know I missed a ton (6 with MIST alone).

      Delete
  3. What is the force fit model?

    ReplyDelete
    Replies
    1. It is a private model that I use to fill out my final bracket, which uses a variety of bracket-picking strategies. It was a term coined by Pete (that I simply borrowed) and an idea that he wanted to try, but never had the time to fully test (I now know why!!!). It's not that it doesn't work, it just takes so much darn effort to get it to work. And if you are wondering, it has never picked a perfect bracket, nor won a "major" bracket contest. It's just something that I have toyed around with since Pete brought the idea up.

      Delete
  4. Are those stats I mentioned even worthwhile to look at? Or are they not in depth enough to provide an accurate picture?

    When does tournament experience begin to play a part? When at least 70% of the points scored by the team from the previous return?

    Are there other stats I should examine such as injuries? Or do those just need to be examined more in depth?

    Also is there a tournament champion checklist that still works?

    ReplyDelete
    Replies
    1. Pete used the rankings as a cut-off line. For example, one of his rules for the National Champ Checklist said "A champ never possessed a KenPom Defensive ranking below X." I don't remember what the exact value was, but he did use the relative rankings as a cut-off measure, a standard that all qualifying teams had to meet. That's probably a more valid use of relative rankings than team-vs-team comparisons.

      As for tournament experience, I've been working on a data set for three years, hope to put it out in one final article before Thursday's games.

      Injuries/suspensions have more to do with timing than with anything else. How much time has a team had to adjust to them. KU had little to no time to adjust to Jackson's suspension. CREI has had plenty of time to adjust to Maurice Watson Jr. Same for XAV and Edmond Sumner.

      I've been doing then since the 1998 tournament, I'm no expert but I am miles ahead of where I was in 98. I wouldn't limit myself to a certain group of statistics and ignore the rest. It's all information, it's all valuable, it's all part of a team's profile. The hard part, and it comes with years of experience, is figuring out how to use it properly and when a particular stat does and doesn't matter. I do believe things like the one-and-done rule and the freedom of movement rules have had such a revolutionary impact on who wins and who loses that I would claim pre-2007 and pre-2014 patterns/rules of bracket-picking may be invalid. Only time (and more data points) will prove me right or wrong.

      As for champ checklist, other followers of Bracket Science have tracked this better than I have, and I know they have updated the "what works, what doesn't work" factors. Unfortunately, I don't have much of the details, and since I consider that Pete's proprietary work, if I did have it, I still wouldn't publish it.

      Delete