Friday, May 15, 2015

Consequence of Morey's Law: Lucky vs Unlucky teams

In a previous post, I looked at a 1994 paper by Daryl Morey (current Houston Rockets GM) who investigated how a team's winning percentage was related to the number of points they scored and allowed, deriving the "modified Pythagorean theorem":

expected win percentage =
  pts_scored ^ 13.91 / (pts_scored ^ 13.91 + pts_allowed ^ 13.91)

At the end of his paper, Daryl explores teams who had the biggest delta between their actual and predicted wins. In 1993-1994, the Chicago Bulls and Houston Rockets top the list and Daryl refers to them as lucky teams. But why is lucked involved?


The rationale is that if you have two teams A and B with almost identical points scored and points allowed, we would expect them to have very similar win percentages. The only way to create a discrepancy (without changing points scored and points allowed... too much), is by changing the outcome of the very close games. So for all the games team A won by a point, flip the scores so that they lose by 1, and reversely for team B who now wins all the games they previously lost by 1. With this hypothetical construction, we will have two teams still with very similar points scored and allowed but potentially different records. It would make common sense that for very close games the probability of each team winning is around 50%, so winning or losing amounts to "luck", whether a desperation buzzer-beater is made or bounces off the back of the rim. And so it would make sense that teams with high discrepancies between actual and predicted wins were either much better or much worse than 50% in close games. Let's confirm.

Here's the table of teams with discrepancies greater or equal to 6 between their actual and projected records, ranked by year:

Team Year Scored Allowed Wins (proj) Wins (actual) Win %
NJN 2000 98.0 99.0 38 31 37.8
DEN 2001 96.6 99.0 34 40 48.8
NJN 2003 95.4 90.1 56 49 59.8
CHA 2005 94.3 100.2 24 18 22.0
NJN 2005 91.4 92.9 36 42 51.2
IND 2006 93.9 92.0 47 41 50.0
TOR 2006 101.1 104.0 33 27 32.9
UTA 2006 92.4 95.0 33 41 50.0
BOS 2007 95.8 99.2 31 24 29.3
CHI 2007 98.8 93.8 55 49 59.8
DAL 2007 100.0 92.8 61 67 81.7
MIA 2007 94.6 95.5 38 44 53.7
SAS 2007 98.5 90.1 64 58 70.7
NJN 2008 95.8 100.9 27 34 41.5
TOR 2008 100.2 97.3 49 41 50.0
DAL 2010 102.0 99.3 49 55 67.1
GSW 2010 108.8 112.4 32 26 31.7
MIN 2011 101.1 107.7 24 17 20.7
PHI 2012 93.6 89.4 43 35 53.0
BRK 2014 98.5 99.5 38 44 53.7
MIN 2014 106.9 104.3 48 40 48.8


So how did these teams fare in close games? I've labelled a team/year as High if they won 6 or more games than expected (8 teams from the previous list), Low if they lost 6 or more  games than expected (13 teams from the previous list), and Normal otherwise. I then look for each group their win percentage in closely contested games (final scores within 1, 2 and 3 points).

Final scores within 1 point:

Type # Wins # Games Win %
Normal 721 1439 50.1
Low 8 21 38.1
High 2 2 100.0

Final scores within 2 points:

Type # Wins # Games Win %
Normal 1760 3515 50.1
Low 20 52 38.5
High 10 13 76.9

Final scores within 3 points:

Type # Wins # Games Win %
Normal 2820 5617 50.2
Low 24 80 30.0
High 14 19 73.7

Our intuition was correct and so were Daryl's closing comments: teams can indeed be qualified as lucky and unlucky, some winning almost 3 out of 4 close match-ups, others losing 2 out of 3 tight games. This intangible "luck" factor is sufficient to explain why certain teams have much better or worse records than their offense/defense would typically lead to. It doesn't take much for to flip the outcome of an entire game.


As a quick aside, much has been said about the San Antonio Spurs this year and their drop from a potential 2nd seed to 6th seed entering the Playoffs. Most articles focused on their loss on the final day of the regular season which led to that seeding free-fall, but was excessive focus placed on that last game? Had they been particularly lucky/unlucky during the season? It turns out their record is a couple games lower than what the modified Pythagorean theorem would have predicted, and that they weren't particularly lucky or unlucky in their close games, winning 2 of 5 games decided by 1 point, and 6 of 13 decided by 3 points or less.

Saturday, February 28, 2015

Morey's Law: How do points scored and points allowed tie to win percentage?



It all started in baseball, when Bill James found a very elegant formula linking a baseball team's winning percentage to the number of runs it scored and allowed:

expected win percentage =
  runs_scored ^ 2 / (runs_scored ^ 2 + runs_allowed ^ 2)

Because the variables are raised to the second power, the formula became knows as the "Pythagorean expectation formula".

In 1994, Daryl Morey, one of the biggest proponent of analytics in basketball and now GM for the Houston Rockets, adapted the formula for basketball teams. The overall structure remains the same, but the power of 2 was replaced by 13.91. Here's an extract from Daryl's formula in STATS Basketball Scoreboard:


Essentially the same formula as for baseball but with 13.91 as the power:

expected win percentage =
  pts_scored ^ 13.91 / (pts_scored ^ 13.91 + pts_allowed ^ 13.91)

In this post, I wanted to further explore this formula and answer questions such as: how accurate is it? it was based on data up until 1993-1994, is it still accurate with today's data? are there other more accurate formulas out there?

To start off, I extracted all relevant statistics by team and by year for the past 15 complete seasons, going from 1999-2000 to 2013-2014.

Let's start by looking at how accurate Daryl's formula is when looking through these last seasons:


Well, formula still applies quite well to say the least! Of course, the exact coefficient might be slightly off so I used the more recent data, and fit the same model. The fitted value for the exponent turned out to be 13.86. Despite all the rule changes over the bast twenty plus years (three free throws on three-point fouls, hand-checking, clear path...) and the fact that the early nineties are regarded as a completely different era of basketball as now (somewhat linked to the rule changes), the value is almost identical, less than a 0.4% difference!

But back to the formula. It seems to perform remarkably well and fitting the data, but can we do better? There is room for additional flexibility: in Morey's formula, all three terms are raised to the same power. What if points scored and points allowed were allowed to be raised to different values?

expected win percentage =
  pts_scored ^ a / (pts_scored ^ a + pts_allowed ^ b)

Or if all three terms could have different powers?

expected win percentage =
  pts_scored ^ a / (pts_scored ^ c + pts_allowed ^ b)

When fitting those new more flexible models, it turns out that the fitted coefficients remain very close to 14. Naturally with additional we observe a decrease in residual sum of squares, but nothing extravagant either. We'll revisit this point later in the post.

But let's step back for a minute, what exactly does the exponent value correspond to? For values scored and allowed points ranging from 90 to 110, I generated three charts displaying expected win percentage for the respective exponent values of 2, 14 and 50.


We notice that the value controls how slowly/quickly the surface goes to 0 and 1 as the difference between points scored and allowed increases. When points scored is 100 and points allowed is 95, win percentages are 52% (exp of 2), 67% (exp of 14) and 93% (exp of 50).

But something else stands out in all graphs: they appear invariant in one direction, the first diagonal (going through points (90, 90) and (110, 110)). In other terms, a team allowing 90 points and scoring 93 has an expected winning percentage only very slightly off from another team scoring 110 and allowing 107. What truly matters is the delta between points allowed and scored, not the absolute value of these two numbers.

Let's do some very simple exploratory data analysis looking at actual win percentages against points scored and allowed:


As one would have expected there is definitely some correlation there - especially regarding points allowed (which could be an additional argument for promoting a strong defense over a strong offense, but that is for another time).



But things get really interesting when we look at the difference between points scored and allowed:


You don't often come across a correlation of 0.97 just plotting two randoms against each other in your data set! It looks like someone created a rectangular stencil in cardboard, placed it over tan empty plot, and asked their 3-year old kid to go polk-a-dot-crazy within the rectangular region. Can this strong relationship be leveraged for an alternative formula to Morey's?

A simple linear model begs to be fit, but would it also make sense to add a quadratic or cubic term? Quadratic (or any even number fo that matter) does not seem reasonable: a delta of 5 or -5 suggest VERY different types of performances, so only odd-numbered powers should be considered. Here's the plot with the fits from a single term and with an additional cubic term:


We've now come to the point where we have five models (the three "pythagorean" with various degrees of flexibility which I'll refer to as the single/double/triple power models based on how many coefficients are fit) and two linear models (with and without the cubic term). Can one be established as being significantly superior to the others? Will Morey's formula hold?

Of course the easiest way to compare would be looking at the fits and comparing residual sum of squares, but this will always lean towards the more complex models and yield to the overfitting problems we constantly hear about. So how do we go about it? Simply the way overfitting is dealt with in the abundant litterature: cross-validation. The data is randomly split in to training and testing datasets, the model is constructed based on the training data, but evaluated on the test data never seen before.

And the results are in!




Based on my random splits, it seems's that while all models perform very similarly, Morey's formula (simple power) has a slight advantage. It didn't achieve the minimal RSS, did yield the maximum, but median RSS was lower than all other models, though not significantly.

So after all this work, we weren't able to come up with a better way to reliably and robustly compute expected win percentages than a formula over 20 years old!

In a next post we'll dig a little deeper into the data and try to understand the largest discrepancies. In his original paper, what did Daryl Morey mean when referring to the Chicago Bulls as a lucky team in 1993-1994?


Monday, February 23, 2015

Shaqtin-a-bias?


All NBA fans know about Shaqtin-a-fool.
Once a week, Shaquille O'Neal hosts this small segment on the NBA on TNT show. Five humorous video clips are shown, with players definitely not at their best. Erratic passes, obvious travels, missed wide-open dunks and layups, lost shoes...
The segment is also available on nba.com, and fans can vote for the best Shaqtin-a-fool moment.


For volume 4 episode 11 (they're referenced just like a TV series, with season and episode), and similarly to over 50% of the voters, I had voted for the last video clip shown which was that week's clear winner. A weird sensation I had been carrying over from week to week suddenly materialized: it seemed to me that the last video clip was winning a disproportionate number of times.

Two explanations came to mind: the video clips were not shown randomly in Shaq's segment, but sorted according to users' preferences. Or the human mind was biased with its short term memory, not exactly remembering the first clips, and finding the last disproportionately funnier.

It was all the more obvious for this episode 11, where the poll results were in the exact reverse order they were shown in:



But before investigating the human brain and mind too deeply, I first had to see if my brain wasn't the one tricking me, and sought statistical confirmation that there was indeed a bias favoring the last video shown.

First things first, data was required. Unable to automatically run a script to pull the survey results from polldaddy.com, I manually went through the last 28 episodes (including some special episodes for the All Star Game, the Playoffs and past eras), noting for each video the order it was shown ("Input Order"), and the position it was in the survey results ("Output Ranking").

A quick first visual exploration of the data, linking Input Order to Output Ranking:


I added some jitter to avoid all the lines overlaying each other and hiding the number of observations. It did seem that the majority of the lines were in the steepest diagonal, indicating that the most common "transition" was from videos being shown in 5th position coming out first in the survey results. At least I wasn't imagining the whole thing!

Because the diagonal has longer length than horizontal lines, there could still be an optical illusion suggesting that indeed there are more lines when we are actually seeing more color from longer lines, not more lines. So I re-generated the same graph but reversing the order of the inputs, so that the last video shown is not labelled 1, and the first video shown is 5.


No visual trick here, definitely looks like the last video shown is the most likely to win the poll (horizontal lines going from 1 to 1).

Now for the statistical confirmation. The most suited test here is a chi-square, comparing observed counts with expected counts under the null hypothesis that video order doesn't matter and all videos are equally likely to end up in any position.

The first test I ran looked at the full data and all the Input Order - Output Ranking counts:

Output: 1 Output: 2 Output: 3 Output: 4 Output: 5
Input: 1 3 3 13 6 3
Input: 2 1 2 3 10 12
Input: 3 4 5 6 9 4
Input: 4 3 8 5 3 9
Input: 5 17 10 1 0 0

The chi-square strongly rejected the null hypothesis: input order and output ranking were strongly linked.

The second test focused uniquely on the winner of the poll. In which position was the winner shown?

The table below summarizes the data:

Input: 1 Input: 2 Input: 3 Input: 4 Input: 5
Count 3 1 4 3 17

That's right, in 60% of cases the survey winner was shown in last position! It's clear from the data that not all positions are created equally and a second chi-square confirmed this.

So back to Shaq. Now that we've confirmed that there is a strong bias, can we try explaining the phenomenon?

My first idea (perhaps having spent too much time doing analyses for marketing teams!) was that the videos were not randomly shown but already sorted according to expected viewers' preference. It's a possibility, but a rather weak one. What would be the rationale? To get people hooked on the show as the clips get funnier and funnier? Sure, but recall that the whole Shaqtin-a-fool lasts 2-3 minutes tops, I'm not not sure if users really need to get hooked. Plus, until they see the last videos, the audience has no way of determining whether the best videos have already been shown.

So, I'm actually leaning towards an unconscious bias. I think the same phenomenon occurs if you were asked to rank your best vacations. There might be some clear "great vacations" (honeymoon), and "bad vacations" (lost wallet, passport, got sick), but I believe that with equally enjoyable vacations, the brain might be tempted to rank the latest one higher. A modality effect has been documented usually to describe the improved recall of the last elements of a list, typically when these are presented visually or auditory. I'd be willing to bet something similar is at play here.

However, even if the survey results are much more predictable now, I'm still going to continue watching Shaqtin-a-fool religiously. For pleasure... and more data.

Thursday, February 12, 2015

Are freethrows game-changers?


"And another missed free throw!"
"That's the story of the game right there, they just can't get those easy points from the charity line."

I've heard very similar discussions to this one over and over throughout the years from basketball commentators. Although not truly meaning it (at least I think), the commentator was heavily implying that the outcome of the game would be extremely different if a given team had made all, or significantly more, of its free throws.

Don't expect a sophisticated analysis here, I was just curious to explore the correlation between difference in final score and number of missed free throws.



So I wanted to investigate the two following questions:
  1. If the losing team had made all its attempted free throws, would the outcome of the game have been different?
  2. If the losing team AND the winning team had made all of their attempted free throws, would the outcome of the game have been different?

To answer those questions, I pulled all boxscores for regular season games from the 1999-2000 season to the last complete one, 2013-2014 and for each game tracked the final score difference, as well as the number of missed free throws for both the winning and losing teams.

Here's a first quick visual of the relationship between the number of missed free throws for each team (losing team on the lefthand graph, winning team on the righthand graph) and the final score difference:



Rather surprisingly, there does not appear to be any link between the number of missed free throws and the outcome of the came in terms of close game or huge blowout. It would have seemed natural to assume that the more free throws the loser team has, the more likely they are of getting blown out, and the opposite argument for the winning team.

How have these numbers evolved over time? Here's the evolution of the average score difference and missed free throws for those 15 seasons:



Not completely obvious trends emerge from the graph, but if anything can somewhat notice that:

  • the lines for the winning and losing team are extremely similar
  • average score difference has stayed flat or perhaps very slightly increased
  • number of missed free throws has decreased (could be due to better shooting and/or less free throws attempted, and it's actually a little of both)

Now of course this analysis has been as naive as they come. You can't just expect free throws to go from missed to made and expect the entire game to follow its original course. Players might get confidence as they rack up easy points, and coaches might change strategies if what would have been a big lead is only a 2/3 point lead. The point of the exercise here was to compare the range of final point differential to number of missed free throws, and it would seem that free throws only account for about half the final gap. This could be a reason why teams, coaches, players don't put in crazy efforts to have all players shoot 99%. Their time is probably better spent on other types of training.

That being said, I just had to mention the other day's game which saw both teams shoot a combined 37% (16 for 43, 8 for 25 for the Clippers, 8 for 18 for the Nets) from the free throw line. To put things in perspective, Shaquille O'Neal who was criticized his entire career for those shots was 53% over his career (despite finding elaborate strategies to boost the percentage). Consider the Clippers lost by a mere two points 100-102, I'm sure they must be kicking themselves for their performance at the line.

Also worth noting, this game from 1999 between Portland and the Lakers. Portland won quite big - by 15 - but missed only one free throw. The Lakers missed 17! Had both teams been perfect, the Lakers could have actually won a game they lost by 15. This was the biggest outcome reversal I observed in the data if both teams had been perfect.



Thursday, February 5, 2015

Are we seeing All-Stars at the All-Star?


The starters for the Western and Eastern teams of the upcoming NBA All-Star game were just announced Jan 22nd. The selection was uniquely based on fan votes.

In the West we have the vote-leading player Steph Curry, along with Marc Gasol, Blake Griffin, Kobe Bryant and Anthony Davis. Their Eastern counterparts will be Pau Gasol (not sure how often two brothers have faced each other in an All Star Game...), LeBron James, Kyle Lowry, John Wall and Carmelo Anthony.

The selection did raise quite a few eyebrows to say the least. Kobe? Sure he's an NBA legend, future hall-of-famer and all, but look at the Lakers record this season, look at his abysmal shooting percentage of 37.3%. Carmelo is also somewhat of a surprise given how the Knicks are performing this year. Sure the All Star is not about the team but the player, but his stats aren't eye-popping either. And then consider all the ones who didn't get in, James Harden, Klay Thompson, the entire Atlanta Hawk roster... Even if not for those reasons but purely on the voting volume, Mark Cuban declared the voting system broken.


fivethirtyeight.com had a very interesting post on the topic, attempting to correlate players' performance with the number of votes received. Performance was measured in terms of Win Above Replacement (WAR), the number of team wins attributable to that player (computed as the difference between the number of wins the team got with that player in the game, versus a hypothetical world where the player is replaced by an average player). It does seem that the above a certain threshold, high-impact players get the votes they deserve, but under that threshold it's all more or less random.

Now I think the real question is: what do we want in an All Star game? Players naturally view it as an honor, a testimony of a great year they're having. But are fans voting for players deserving recognition? Or do they want pure 100% showtime? Imagine a natural born dunker, explosive, athletic and artistic at the rim. Even if that player had below average EFG%, below average WAR, RPM, RAPM or any of the other advanced metrics to measure player performance, wouldn't fans still want to see him in the All Star game?


So while I'm not saying it's fair to the players, I can understand why a Kobe would get voted in, and why a Paul Millsap or Kyle Korver wouldn't. If we really want to understand how fans vote, it would be interesting to see if we could find a metric that better correlates with player votes than WAR. Or perhaps first start including WAR for past seasons as well? I'm sure that if we did that we would have a better understanding as to why Kobe got voted. But how about a combination of team wins + number of dunks in the season? Or number of fast break points?

The debate does seem old and familiar, perhaps because it's so closely related to the one we have every single year about who should be MVP and how MVP is defined? The player with the stellar stats? The player who was most impactful on his team's success?



Thursday, January 29, 2015

One stat to rule them all? It would be a steal


It's been almost a year since Benjamin Morris wrote about The Hidden Value of the NBA Steal on fivethirtyeight.com, and a lot of criticism to say the least followed suit (two examples here and here).

The main criticism stemmed from the comment that "a steal is worth nine points", which caused many to throw their arms up in the air wondering how a player could all of a sudden score nine points in a single try without being in NBA Jam.

My purpose is not to review the original article, the criticisms nor review Morris's four part (!) response (kudos for tackling all the negative comments head-on). However, it is to be noted that since the steal article (Morris' third on fivethirtyeight after two others on basketball), Morris has primarily been tackling other sports than basketball (only 5 of 48, this is an advantage to writing this post so late after the fact).


Trying to take a step back, my attempt was to see how valuable indeed a steal is for measuring the value of a player. If I had to draft/trade for either a player who gets 25 points a game and 1 steal or one who has 16 points and 2 steals (to recycle Morris' example), who should I go for?

There is no perfect gold standard for summarizing a player into a single metric, although their are multiple options that get more and more sophisticated. ESPN reports RPM and WAR, defined on the site as:

  • RPM: Player's estimated on-court impact on team performance, measured in net point differential per 100 offensive and defensive possessions. RPM takes into account teammates, opponents and additional factors
  • WAR: The estimated number of team wins attributable to each player, based on RPM

So are steals are good proxy for a player's "value" assuming RPM and WAR are reliable value metrics?

I generated the following graphs linking steals per game with each of the two variables for the top 30 players in steals for the 2013-2014 season. The two graphs are extremely similar given the strong correlation between RPM and WAR.



I don't know about you, but I'm not seeing a strong correlation with steals.

This doesn't validate or invalidate Morris' analysis, but I thought it would be helpful to get some insight as to whether steals is really as powerful as the original paper would suggest.

I know I said I wouldn't comment on the back-and-forths between Morris and the critics, but one comment I had which I didn't see anywhere was around the fact that Morris seems to focus on steals per game, not my minute, not by possession. It's easier to get more steals if I play more minutes, and I might play more minutes if I'm a good player to start of with, so even if we had found a correlation it wouldn't have allowed us to reach any valuable conclusions.


Saturday, January 24, 2015

Unbe-klay-vable! (apologies, klay-verest pun I could think of)


Exactly nine years and two days ago, Kobe Bryant scored 81 points in a game.
Yesterday, Klay Thompson had a historical feat of his own: 52 points, which in itself is not jaw-dropping, but the way he recorded it was, thanks to 37 points in the third quarter alone (9/9 from 3-point land and 4/4 for 2-pointers).

If there ever was a definition of a player being hot, we witnessed it yesterday! Tracy McGrady did score 13 points in 35 seconds, but what Klay did is on another level.


But for the sake of some fun stat: What was the probability of Klay putting on this insane shooting display?

For the 2014-2015, Klay started the third quarter having attempted 272 3-pointers and made 120 (44.1%). He had also attempted 386 2-pointers and converted 188 of them (48.7%).

Assuming all his third quarter shots are independent of each other (very reasonable assumption, the only thing that could invalidate it is if there was such a thing as a "catching fire" effect), then the probability of Klay scoring a perfect 9-of-9 3-pointers and 4-of-4 2-pointers is 0.441^9 * 0.487^4 = 3.6e-5! Or 1 in almost 30,000. Basically, you would expect such a performance once every 342 seasons! Or Klay would have a higher probability of getting struck by lighting once in his life.

One question that this mind-boggling performance also raises is whether it has cemented Klay Thompson to be as inconsistent as ever? I am sure Nobel prize winner Daniel Kahneman will have some thoughts about this.

Even if you've seen the highlights countless times already, never hurts to review this once-in-342-seasons performance: