The Statisticator: June 2015

Jeff Ely is an Economics Professor at Northwestern, and in 2009, he and one of his PhD students, Toomas Hinnosaar, wrote an blog post entitled "The Overtime Spike in NBA Basketball".

(Incidentally, it was after reading this post shortly after it had been published that I realized that very granular basketball data was publicly available and led me to generate so many basketball-related articles on this very blog).

As indicated by the title, Jeff and Thomas noticed that many more NBA basketball games ended in overtime than one would expect from considering both teams' final scores as independent random variables. This assumption does seem very flawed from the start anyways, as both teams adapt to the other team's playing style and general pace of the game. Except for blowout games (consider the recent 120-66 destruction of the Milwaukee Bucks by the Chicago Bulls in a Playoff game), there is a rather strong correlation between points scored by each of the teams:

But Jeff and Toomas went further than just highlighting the discrepancy between expected games with overtime (~2%) and actual games with overtimes (~6%), they uncovered a surprising spike in score difference which emerges only seconds before the end of regulation.

I recently thought back about this analysis and wanted to revisit it, looking at the following questions:

Do we still observe the same phenomenon nowadays?
Do we observe the same effect towards the end of overtimes? One could argue that overtimes are quite likely to lead to more overtimes given that we whatever behavior emerged at the end of regulation will probably appear again at the end of overtime, but also that in only five minutes versus 48 minutes, scores have much less time to diverge.
Do we observe the same effects during the Playoffs?

Jeff and Toomas' analysis used data from all games between 1997 and 2009, I pulled all successive years, from 2009 to 2015, separating regular season and playoff games (it is not entirely clear if the original analysis combined both types of games or focused on the regular season only). Similarly to the original analysis, I defined score difference as home team's score minus road team's score, so a positive value could be interpreted as homecourt advantage.

First off, here is the evolution of the mean and the standard deviation of the score differential throughout regulation for regular season games, followed by playoff games:

The curves are extremely similar, with the home team advantage gradually increasing throughout the game, especially in the second half of playoff games. But the standard deviations are very large compared to point differential. Interesting to see standard deviations increase at a decreasing rate and even decrease in the final minutes. This is probably in games with certain outcome where starters are puled out and losing team able to somewhat decrease the point differential. Given the standard deviation of point differential, it now makes sense that overtimes are theoretically quite unlikely.

Do we still observe the same phenomenon nowadays?

Let us look at an animation of the score difference as the game progresses (regular season games only):

I generated a similar video taking a closer look at the last quarter at a finer level of granularity (6s increments instead of 30s).

Everything behaves as expected for the first 47 minutes of the 48 minute game. On a slightly more technical note, if we were to assume that the scores for each team at any given point in time are approximately normal and independent, then the difference in the two would also be normal. This assumption doesn't not seem to be violated for most of the game, except when it matters most, right at the end of regulation:

While the final graph is somewhat surprising at first glance, it makes a lot of sense for those who have seen a few close games on TV. In the middle of the game, a team losing by a handful of points is not going to freak out and start radically changing its strategy. Points come and go quickly in basketball, losing by two points heading into the third quarter or even fourth is clearly not synonym for defeat. However, losing by two points with 10 seconds left is a whole different story. Defeat is in plain view. If you have possession of the ball, you need to score quickly and close the gap. If the other team has possession things look gloomier. You can't let them run the clock and need to get possession back. Teams do so by intentionally fouling, hoping the other team won't make all freethrows and get the ball back. If the game is tied with only a few seconds left, teams won't panick and intentionally foul, one team might go for a buzzer-beater but without taking unnecessary risks. So in other words, the closing seconds of a game have the particularity that:

wide score differences are a stable equilibrium, the losing team has essentially thrown the towel
small score differences are highly unstable, the losing team is going to seek to reach a score difference of 0 (see next case) or gain the lead in which case we remain in an unstable state with roles reversed
score difference of 0 is a stable equilibrium stuck between two highly unstable states

With this perspective, the distribution graph makes complete sense!

Do we observe the same effect towards the end of overtimes?

It wouldn't be too far-fetched to consider an overtime as a 5 minute version of a full game given that both teams start off with a tie. Here's the animation of the score difference over all overtimes (combining first, second, third... overtimes) in the regular season from 2009 to 2015:

So in a nutshell, we do indeed observe the same phenomenon, which makes perfect sense given that not only do we find ourselves in the same state of stable/unstable equilibrium in the last possessions of the game, but scores have also had less time (5 vs 48 minutes) to diverge.
But as divergence is less likely, is a second overtime more likely than a first overtime? What about a third overtime? Will scores diverge even less as players get tired, players foul out and stakes being raised leading to even more conservative game play?

Here are the numbers of interest:
For the 5876 regular season games considered, 373 went to overtime (6.3%).
Out of 373 the games that went to a first overtime, 62 went to a second overtime (16.6%).
Out of the 62 games that went to a second overtime, 15 went to a third overtime (24.2%).
Only one game of those 15 (6.7%) eventually ended in quadruple overtime, with the Hawks outlasting the Jazz 139-133.

Do we observe the same effects during the Playoffs?

Players, coaches, fans always state that Playoffs, with its increased pressure and more physical play, are an entirely different animal compared to the regular season. But what about the end-of-game behavior just observed? Losing a game can end a season, so one would expect score differences of a few points to be extremely unstable.

The following animation suggests that the behavior is actually very similar to what we saw earlier for regular season games:

(link to animation focusing on fourth quarter)

What about the occurrence of overtimes? Again, Playoff numbers coincide with regular season games with 28 of 308 (8.3%) of games going to overtime. Sample sizes then get quite small, but it's a fun fact to see that we've had more playoff games end in triple overtime (2) than in double overtime (1).

So to summarize, not only will natural game dynamics will make overtimes more likely to occur than one would naively expect, but overtimes are also quite likely to lead to subsequent overtimes. This is great news for the fans, and for the NBA's TV deals. Perhaps less so for teams that have another game the following day...

The Statisticator

Saturday, June 6, 2015

The "Overtime effect": Why things go crazy in the final seconds of regulation