Project: Perfect Bracket: Trends: The Least-Quantified Metric

Feb 22, 2017

Trends: The Least-Quantified Metric

Since I left you the entire season in suspense over the "most talked about....least quantified" metric, I thought I would end the suspense right away by putting it in the title. This way, we can focus entirely on the idea instead of the mystery. If you have ever watched a basketball (or any sporting) game, or even the unveiling of the brackets, I guarantee that you have heard a specific team's current trend "qualified" in no uncertain terms: "This team is on a 6-game winning streak," This team seems to be in shooting slump," or "This team's average points per game is X, but over their last 4 games, it is higher." Although these statements contain numerical evidence suggestive of a trend in place for a specific team, I would say these qualitative statements suggest nothing more than plain factual evidence. Yes, that team has won 6 straight games. Yes, that team is missing a lot of open shots. Yes, that team's points per game is higher now than it was 4 games ago. To see why these statements are deceptive when it comes to the concept of trends, let's jump right in.

Trending Now: Definitions

Since I have probably created much confusion with what constitutes a trend and what does not, let's start with the obvious: How to define a trend. A trend must have these two qualities:

A Direction - Up, Down, Sideways. As long a team isn't alternating between their best game of the year followed by their worst game of the year, a trend should be noticeable.
An Indication of the Change in Direction - A quantifiable point or a visual pattern that the change in direction is about to happen.

With a solid definition for trend, let's now look at why the general qualitative statements are deceptive when it comes to the concept of trends.
"This team is on a 6-game winning streak."

Direction (PASS)- I would say in the up direction (or more casually, in the right direction).
Indication of Change (FAIL) - Do all winning streaks stop at 6? Do winning streaks of 6 imply additional wins (i.e. - 2 more wins, I'd love to be able to pick a team to go to the Sweet Sixteen simply because I know their 6-game winning streak guarantees two more wins). There is no concrete indicator of when this trend will end.

"This team seems to be in a shooting slump."

Direction (PASS) - Shooting % is in a downward direction.
Indication of Change (FAIL) - Again, this statement gives no clear timetable of when (or even if) they will return to their shooting averages. DAME went through a shooting slump earlier in the year, UVA is currently in one, but stating that a team is a slump does not indicate how long the slump will continue.

You could do the exact same with the third statement, but I didn't want to be redundant. As you can see, the second quality is what separates a true trend from a general statement suggestive of a trend.

Trending Now: Purpose

Trend analysis is simply finding patterns in past data that possibly foretell future occurrences. With a direction clearly identified and a metric that indicates the trend is likely to change, trend analysis can be a very valuable tool in identifying teams that could potentially make runs into the tournament. In the 2016 tournament, Villanova went on a historical offensive efficiency streak even though at the onset of the tournament, they ranked 5th overall in team efficiency and 11th overall in offensive efficiency. Wouldn't it have been nice to be able to predict this offensive efficiency streak before the streak even happened? That is what I hope to accomplish with this (unpolished) venture into trend analysis, and in the coming years, I hope to have trend analysis down to a science. In this article, I will look at three clear examples of trends, explaining the two trend factors for these examples and using these three examples for predictive purposes.

Trending Now: Identifying Direction

Since the first quality of trends -- identifying direction -- is discernible from either general statements or statistical patterns, this should be the starting point of our analysis. The three examples are DUKE, IND, and NOVA. If we begin our trend analysis with the rudimentary approach, we could make generalized qualitative statements about each of these three teams that would suggest a direction their seasons have went.

DUKE played well at the start of the season, then they tripped up (pun intended) around the time of the ELON game, and just recently they began playing well again.
IND also started the season strong, but they fell apart just before the start of conference play and have been abysmal ever since.
NOVA has played well throughout with a few bright spots and a few slip-ups along the way.

If you have followed college basketball through the course of the season, you could have came up with the exact same statements. We want to take it a step further and we want to see data that visually demonstrates these qualitative statements on direction.

Here is the graph on DUKE using daily KenPom ratings throughout the season.

Here is the same graph for IND.

Finally, here is the same graph for NOVA.

I kept an approximately similar range for the y-axis scale for all three teams to keep them visually comparative. I think the charts speak for themselves, and considering the purpose at hand, they clearly demonstrate the concept of direction for trend analysis.

Trending Now: Indicators of Change

Looking at the charts above, each (especially the first two) show points in time when the trend went from one direction to another. However, looking at the chart on those specific days when the trend changed, it would be impossible for the most astute trend analyst to claim "this is a peak and a downtrend is coming" or "this is a trough and an uptrend is coming." To have indicators of directional change, we are going to make them ourselves. Luckily, I've been doing stock market data analysis (as a hobby) since 1999 and if there is one industry that knows a thing or two about trend analysis, it would be financial market analysis. I'm actually going to borrow a couple of tools from financial market analysis and employ them in trend analysis for college basketball.

Moving averages - a tool in trend analysis that smooths out fluctuations in volatile (or continually changing) data points. Essentially, a moving average trades out the oldest data point for the most recent data point, and the difference in value between the two data points indicates the direction of the trend over the duration in time between the oldest and newest data point.*
Bollinger Bands - a tool in trend analysis created by financial market analyst John Bollinger that charts out standard deviation bands alongside moving averages, which visually creates a trading channel for the financial asset.**

With these two tools, let's create a few charts and see how they work. On these charts, blue will be our original KenPom data, magenta will be our moving average, and the two yellow bands will be our upper and lower bounds for standard deviation. I'll start again with the chart for DUKE.

Now, let's look at the chart for IND.

Finally, we have the chart for NOVA.

Before I start analyzing the three charts, I want to go over two bits of details on methodology.

In each of these three charts, I used a 21-day moving average. I chose 21 days because the data being used to predict the bracket will come out approximately 21 days before the national championship is played. I assumed seeing the data in 21-day intervals would be the best approximation for finding patterns in the data that would better predict what could happen in the next 21 days. I do not definitively know if 21 days is the right value, it just seemed logical.
In approximating the upper and lower (Bollinger) bands, I also used 21 days worth of values to calculate the standard deviation (for consistency with the moving average). Both bands -- upper and lower -- are exactly +2 and -2 standard deviations from the 21-day moving average.

So what are the charts telling us? Although it is not precise, as the original data approaches the standard deviation bands, shortly thereafter the direction of the original data changes. It usually changes within one to seven days after contacting or breaching the standard deviation band. If it contacts the upper band, then a downward trend is about to begin, and if it contacts the lower band, then an upward trend is about to begin.

Trending Now: Practical Application

1. The most important advice I can give about using trend analysis: Use good data. For this exercise, I used daily KenPom ratings. As I have said all year, I am unfamiliar with his new methodology on calculating his ratings, but I know he still uses the same efficiency concepts pioneered by Dean Oliver. However, I do know that his ratings early in the season used some kind of estimator until he could get enough regular season data to normalize his ratings. I'm not sure if he made estimates based on the previous year's final statistics or if he used data from pre-season games and dropped this data out of the ratings system when enough regular season games had been played. This could explain why the DUKE trendline started the season in the 30s, normalized in the low 20s, and then began an new uptrend, which, at its current pace, would not get DUKE back into the 30s. It could explain why a few of the charts show bottle-necking patterns, where the two yellow lines accelerate towards one another creating a narrow cap on a wide body (like a coke bottle).

2. A change (or soon-to-happen change) in trend does not necessarily translate to a loss on the basketball court. Theoretically, a team can perform below their current trend's level and still win the game because they are statistically superior than their opponent. If an indicator suggests that the trend is about to change, it simply means their rating is going to be lower (due to bad play) than before. It does not imply that they will lose their next game.

3. While I wouldn't use trend analysis to make every single pick in my bracket, it should have some uses in picking certain games. If I see an 8-9 or 7-10 where one team is definitely trending up with no signs of trend-change and the other team is definitively trending down with no signs of trend-change, I would pick the up-trending team without hesitation. I know, for now, this strategy has no use to most of my readers because I doubt most of them have been collecting daily ratings data like I have been. I hope I can post a page on the blog with charts of each tournament team's trend, but time constraints may or may not allow for it. I will let you know via the "To My Readers" section if it will happen.

4. In most years, there always seems to be a surprise team that makes a deep run that very few see coming. In 2016, it was SYR. In 2015, it was MIST. In 2014, it was CONN and UK. We also can't forget 2013 WICH, 2011 VCU, 2010 and 2011 BUT and the surprise deep runs they made. Hopefully trend analysis will see it coming. If I was to speculate what the trend chart of a surprise team would look like, I would assume something like this:

Moving-average (magenta) and standard deviation (yellow) curves changing from flat to upward-sloping.
A ratings line (dark blue) making consecutive higher top-points and consecutive higher low-points (each subsequent high point is higher than the previous and each subsequent low point is higher than the previous).
A ratings line (dark blue) mostly between the moving average (magenta) and the lower-bound standard deviation (yellow) curve.

Though I feel like the pictures are far more exciting than the text I used to describe them, I hope this article on trend analysis was an eye-opening adventure into a new way of looking at team and match-up analysis for the NCAA tournament. As always, thanks for reading, and Wed Mar 1 will be the March Edition of the Quality Curve Analysis.

10 comments:

Unknown3/3/17, 8:59 PM
Yes this is interesting. I wonder if it would explain the syracuse run last year.

It could make sense just looking at the schedule.

They lost their last 3 games of the regular season. Including a loss to PITT in the 1st acc tournament game.

So at that point maybe SYR is at their low point below the moving average.

Then they play DAY who is also at their low point. I believe that DAY had injuries to their best player.

That would explain that win. Next was MID TEN ST. So SYR could beat them without their best game. Now they are trending above their moving average for GONZ. GONZ maybe is trending down at that point so SYR sneaks by (with help from awful call if I remember correctly).

So now SYR at their peak plays VIR who is possibly at their low point.

After that idc where SYR is UNC crushes them.

I dont know if the data backs up any of that. Or if that works on any of the other crazy runs.

but this is definitely interesting and I appreciate all of the work that you are putting out there.
ReplyDelete
Replies
ML3/4/17, 11:43 AM
There's nothing like a loss to refocus a team and there's nothing like a long-winning streak to breed complacency. The peak-and-trough logic that you described is what I hope trend analysis will uncover (or even better, prove). Unfortunately, I didn't start collecting data for trend analysis until this season, so I don't have any way to substantiate the logic. Since trend analysis is in its infancy, I probably won't put a lot of emphasis on it for the 2017 tournament until I've had time to really explore and tinker with it.
ReplyDelete
Replies
Unknown3/8/17, 10:15 PM
What is the force fit model?
ReplyDelete
Replies
Unknown3/9/17, 9:44 PM
Are those stats I mentioned even worthwhile to look at? Or are they not in depth enough to provide an accurate picture?

When does tournament experience begin to play a part? When at least 70% of the points scored by the team from the previous return?

Are there other stats I should examine such as injuries? Or do those just need to be examined more in depth?

Also is there a tournament champion checklist that still works?
ReplyDelete
Replies

Add comment

Tentative Schedule

More article dates will be added as the season progresses
Sun Mar 12 to Thurs Mar 16: BRACKET CRUNCH WEEK (BCW)
BCW (Mon): All 3 OS/US Models and Champion Profile Model (Complete)
BCW (Tues): Quality Curve Analysis & Return-and-Improve Model (Complete)
BCW (Wed): Meta Analysis (Complete) & Briefs on SGLT (Posted)
BCW (Thurs): Final article (Complete). However, I was unable to get into any bracket contests (wanted to do YAHOO and MGM bracket challenges, but couldn't get logins to work on either) Oh well.

My Initial Thoughts/Reactions to the Bracket

Based on answers by the Committee Chairman and some of the patterns evident in the bracket, it seems there was a heavy reliance on the NET, especially Q1+Q2 record; SOS was less important. As always, a few easy picks will come from exploiting flaws in the Committee's appraisal of team quality. For the second year in a row, I'm miffed at some of the site selections. How does 9-sd AUB get to play in the state of ALA? Or 13-sd Iona playing in their home state of NY? 9-sd ILL playing next door in Des Moines, Iowa. In fact, KU has to play in the same building as TEX and TXAM, so I doubt these fans will be pulling for KU (TXAM might if ARK is their 2nd rd opponent). 6-sd CREI getting a closer site than 3-sd BAY, 11-sd AZST and 6-sd TCU getting closer to Denver than 3-sd Gonzaga, and 7-sd MIST getting closer to Columbus, OH than 2-sd MARQ. I understand keeping teams close to home to sell tickets, but you also have to protect top seeds because they earned it. For the record, I called out a lot of these issues last year, but all of the site favorability produced a losing record. Finally for documentation purposes, the Selection Show talked crap about BAY, UVA, SDST, IND, MIA-FL, STMY, and TENN as potential first-round upsets and HOU being undeserving of the 2nd 1-seed. There's nothing like hearing your team called in the bracket, then immediately getting called out on National TV as an upset victim. It will be interesting to see how these games play out.

Feb 22, 2017

Trends: The Least-Quantified Metric

10 comments:

Subscribe To