The Statisticator

Thursday, February 5, 2015

Are we seeing All-Stars at the All-Star?

The starters for the Western and Eastern teams of the upcoming NBA All-Star game were just announced Jan 22nd. The selection was uniquely based on fan votes.

In the West we have the vote-leading player Steph Curry, along with Marc Gasol, Blake Griffin, Kobe Bryant and Anthony Davis. Their Eastern counterparts will be Pau Gasol (not sure how often two brothers have faced each other in an All Star Game...), LeBron James, Kyle Lowry, John Wall and Carmelo Anthony.

The selection did raise quite a few eyebrows to say the least. Kobe? Sure he's an NBA legend, future hall-of-famer and all, but look at the Lakers record this season, look at his abysmal shooting percentage of 37.3%. Carmelo is also somewhat of a surprise given how the Knicks are performing this year. Sure the All Star is not about the team but the player, but his stats aren't eye-popping either. And then consider all the ones who didn't get in, James Harden, Klay Thompson, the entire Atlanta Hawk roster... Even if not for those reasons but purely on the voting volume, Mark Cuban declared the voting system broken.

fivethirtyeight.com had a very interesting post on the topic, attempting to correlate players' performance with the number of votes received. Performance was measured in terms of Win Above Replacement (WAR), the number of team wins attributable to that player (computed as the difference between the number of wins the team got with that player in the game, versus a hypothetical world where the player is replaced by an average player). It does seem that the above a certain threshold, high-impact players get the votes they deserve, but under that threshold it's all more or less random.

Now I think the real question is: what do we want in an All Star game? Players naturally view it as an honor, a testimony of a great year they're having. But are fans voting for players deserving recognition? Or do they want pure 100% showtime? Imagine a natural born dunker, explosive, athletic and artistic at the rim. Even if that player had below average EFG%, below average WAR, RPM, RAPM or any of the other advanced metrics to measure player performance, wouldn't fans still want to see him in the All Star game?

So while I'm not saying it's fair to the players, I can understand why a Kobe would get voted in, and why a Paul Millsap or Kyle Korver wouldn't. If we really want to understand how fans vote, it would be interesting to see if we could find a metric that better correlates with player votes than WAR. Or perhaps first start including WAR for past seasons as well? I'm sure that if we did that we would have a better understanding as to why Kobe got voted. But how about a combination of team wins + number of dunks in the season? Or number of fast break points?

The debate does seem old and familiar, perhaps because it's so closely related to the one we have every single year about who should be MVP and how MVP is defined? The player with the stellar stats? The player who was most impactful on his team's success?

Thursday, January 29, 2015

One stat to rule them all? It would be a steal

It's been almost a year since Benjamin Morris wrote about The Hidden Value of the NBA Steal on fivethirtyeight.com, and a lot of criticism to say the least followed suit (two examples here and here).

The main criticism stemmed from the comment that "a steal is worth nine points", which caused many to throw their arms up in the air wondering how a player could all of a sudden score nine points in a single try without being in NBA Jam.

My purpose is not to review the original article, the criticisms nor review Morris's four part (!) response (kudos for tackling all the negative comments head-on). However, it is to be noted that since the steal article (Morris' third on fivethirtyeight after two others on basketball), Morris has primarily been tackling other sports than basketball (only 5 of 48, this is an advantage to writing this post so late after the fact).

Trying to take a step back, my attempt was to see how valuable indeed a steal is for measuring the value of a player. If I had to draft/trade for either a player who gets 25 points a game and 1 steal or one who has 16 points and 2 steals (to recycle Morris' example), who should I go for?

There is no perfect gold standard for summarizing a player into a single metric, although their are multiple options that get more and more sophisticated. ESPN reports RPM and WAR, defined on the site as:

RPM: Player's estimated on-court impact on team performance, measured in net point differential per 100 offensive and defensive possessions. RPM takes into account teammates, opponents and additional factors
WAR: The estimated number of team wins attributable to each player, based on RPM

So are steals are good proxy for a player's "value" assuming RPM and WAR are reliable value metrics?

I generated the following graphs linking steals per game with each of the two variables for the top 30 players in steals for the 2013-2014 season. The two graphs are extremely similar given the strong correlation between RPM and WAR.

I don't know about you, but I'm not seeing a strong correlation with steals.

This doesn't validate or invalidate Morris' analysis, but I thought it would be helpful to get some insight as to whether steals is really as powerful as the original paper would suggest.

I know I said I wouldn't comment on the back-and-forths between Morris and the critics, but one comment I had which I didn't see anywhere was around the fact that Morris seems to focus on steals per game, not my minute, not by possession. It's easier to get more steals if I play more minutes, and I might play more minutes if I'm a good player to start of with, so even if we had found a correlation it wouldn't have allowed us to reach any valuable conclusions.

Saturday, January 24, 2015

Unbe-klay-vable! (apologies, klay-verest pun I could think of)

Exactly nine years and two days ago, Kobe Bryant scored 81 points in a game.
Yesterday, Klay Thompson had a historical feat of his own: 52 points, which in itself is not jaw-dropping, but the way he recorded it was, thanks to 37 points in the third quarter alone (9/9 from 3-point land and 4/4 for 2-pointers).

If there ever was a definition of a player being hot, we witnessed it yesterday! Tracy McGrady did score 13 points in 35 seconds, but what Klay did is on another level.

But for the sake of some fun stat: What was the probability of Klay putting on this insane shooting display?

For the 2014-2015, Klay started the third quarter having attempted 272 3-pointers and made 120 (44.1%). He had also attempted 386 2-pointers and converted 188 of them (48.7%).

Assuming all his third quarter shots are independent of each other (very reasonable assumption, the only thing that could invalidate it is if there was such a thing as a "catching fire" effect), then the probability of Klay scoring a perfect 9-of-9 3-pointers and 4-of-4 2-pointers is 0.441^9 * 0.487^4 = 3.6e-5! Or 1 in almost 30,000. Basically, you would expect such a performance once every 342 seasons! Or Klay would have a higher probability of getting struck by lighting once in his life.

One question that this mind-boggling performance also raises is whether it has cemented Klay Thompson to be as inconsistent as ever? I am sure Nobel prize winner Daniel Kahneman will have some thoughts about this.

Even if you've seen the highlights countless times already, never hurts to review this once-in-342-seasons performance:

Tuesday, January 6, 2015

Fifteen seconds remaining, down by one....Who you gonna foul?

I was following the Mavericks - Kings game earlier this year. Exciting game which went into overtime. With 50 seconds left and the Kings down by two, Rajon Rondo fouled Jason Thompson and in the process sent him to the free throw line. He made the first shot. But before he could attempt the second, the Mavs called a timeout. After the timeout, Jason went back to shoot his second free throw, and missed it.

Did the timeout have any impact on the missed shot? You couldn't make a free throw any more straightforward. Unlike a penalty kick at soccer, there's nothing your opponents can do to alter that shot. Except call a timeout? Legendary coach Phil Jackson was (in)famous for calling timeouts between opponents free throws, but was this ploy effective at all? Thompson could have tied the game on his second free throw attempt, which would have shifted a considerable amount of pressure of his and his teammates' shoulders with less than a minute to go. With the timeout called, Thompson was left there brewing in these thoughts with mounting pressure.

That game made me want to investigate the timeout phenomenon, as well as other external factors that could influence the outcome of a free throw. I also wanted to follow up on an earlier post I made about measuring players' clutch performances via statistical models.

A few words on the data before jumping into the analysis. I focused on the most recent complete NBA season: 2013-2014. I pulled all the play-by-play data from nba.com, and pulled free throw season percentages for each player from espn.
Quite a bit of cleaning up was required, namely around players with same last name and same team the worst example being the Morris twins in Phoenix who also share same initial!
After cleaning everything up, I was left with just under 56K free throws taken in that season, ready to be analyzed!

I was primarily interested in the impact of free throws interrupted by timeouts, but also wanted to capture two additional factors: whether the shooter has homecourt advantage or not, and whether the situation is "clutch". There are countless definitions of "clutch time" available, some sparking heated debates. I have here defined it as "less than 2 minutes to play in the 4th quarter or in overtime, and less than 5 point differential between the teams' scores".

Before jumping into the data and analysis, let's first do some visual explorations.

How many free throws are taken by quarter?

Not surprisingly, significantly more free throws are taken in the fourth quarter than the first. The game is on the line, the defense goes up a notch, and voluntary fouls are committed to regain ball possession and prevent the opponent from running down the clock.

We can even go down one granularity level at look at the number of free throws made by minute played. Rather impressive to visualize the steady increase throughout each quarter, and the giant spike in the final minute of regulation with teams fouling on purpose in tight games.

We've looked at volume, let's know look at efficiency. How well do the home and road teams shoot the ball?

It appears that both teams shoot at very similar rates throughout the contest, with the home team always having an advantage although it is not a significant as one might have expected given the distractions often displayed by the home fans.

Both teams seem to do better in overtime, but we need to caution against the much smaller sample size there.

And now to the more interesting piece, how do teams execute in clutch time?

Quite surprisingly, the home team appears to be performing no differently, whereas the road team gets a nice boost of almost 5%. The fact that we observe a boost might seem counterintuitive for some: under pressure, with fatigue from close to 48 minutes of gameplay, wouldn't it be more difficult to concentrate and sink the shot? However, a reverse argument could be made that especially when games are close, or when a team expects the other one to intentionally foul, the coach might chose to place his best shooters on the floor. So teams aren't necessarily shooting better, just having better shooters take the shots. This however does not fully explain why the road team has a boost and not the home team.

The following graph shows the 1st quantile, median and third quantile for season free throw percentage of the players taking shots in and out of clutch. It is rather apparent that better shooters are on the floor in clutch moments.

Now that we have a better feel for the data, the analysis can begin. The data is extremely rich and offers multiple options from a statistical analysis point of view. We can leave the baseline free throw shooting percentages for each player be determined by the model fitting, or force these to be the players' season averages. But with different players taking a very different number of free throws within a season, and strong dependency in the success of a free throw for all those taken by the same player, a hierarchical structure emerges and a mixed effects model could make sense.

I actually played around with the three options just mentioned, and was satisfied at how close the numerical outputs were to each other.

The conclusions would indicate that:

homecourt does have a positive effect on shooters' success, although the effect was only borderline significant
calling a timeout before the second (or third) free throw had a negative but insignificant impact
clutch time had a negative and significant impact

Regarding timeouts, the fact that the effect was not significant could be due to the low sample size of these events (84 cases in 2013-2014 out of 56K free throws taken), more coaches should test this strategy so I can tell them if it's effective or not!

As for clutch time, the conclusion seems to contradict the visual exploration where percentages were higher in clutch time. But recall that our explanation to this was that the coaches were putting better shooters on the court. The analysis would indicate that even if the best shooters are on the floor in the closing minutes, they are individually performing less well when the game is on the line than in the middle of the second quarter.

Now one might wonder if we could use the data to detect some of the leagues best clutch free throw shooters. Those cold-blooded killers who can step it up an extra notch when all eyes are on them. The Durants, James, Bryants...

I added some interaction terms for players with sufficient (20) in and out of clutch time free throws and see which ones had the potential to elevate their game. And the results are... no one! Out of the 26 players meeting my criteria, none could significantly increase their free throw percentage. This could again be due to small sample size, but even so most players had negative coefficients. While none had significant positive coefficients, two had significant negative coefficients: Chris Paul and Ramon Sessions.

So back to the post's title, if you're playing the Clippers, fifteen seconds to go and down by one, do you foul Chris Paul?

Monday, December 15, 2014

Clutch or not clutch, that is the stat question

There has always been debate on what clutch is, how it is measured, who performs best in these moments, the list goes on...
I'm definitely not going to settle the debate once and for all in one blog post, but wanted to share a few thoughts and ideas instead.

When we talk about clutch in the NBA, a few names immediately come to mind, Jordan, Bryant, Bird, Miller and Horry. Countless lists can be found with the simplest search, all as subjective as the next ("that was an incredible play in that game!").

But can we actually measure it, and rank players by it?

nba.com has a whole section dedicated to clutch stats on its website. A great first step but amidst all the numbers it's hard to compare players.

SBNation also tackled the issue, clustering players into recipients, creators and scorers (not mutually exclusive) during clutch time (Is Kobe Bryant Actually Clutch? Looking At The NBA's Best Performers In Crunch Time). The article stresses the importance of efficiency by placing all performances in perspective using possessions per 48 minutes on the x-axis. Here are the results for the 2010-2011 season:

Efficiency is a trademark at SB Nation, and Kobe is their primary scapegoat given this viewpoint:

SBNation's perspective is interesting and allows swift comparisons across players, but I feel that it lacks some rigor and robustness around these numbers. How large are the sample sizes? Are the effects significant? Which players are shouldering the most pressure and confronting it head-on? The author underlines these issues himself:

But all of that said ... how reliable are these numbers? There's a school of thought that firmly believes that "clutch" is in the eye of the beholder. They contend that as fans, we see things that may not actually be there. We see Kobe hit a step-back 20-footer and credit his clutch ability, when perhaps we simply should have attributed it to the fact that he's amazing at basketball (in the 1st or 4th quarter).
There are rigorous methods of testing for statistical significance. Rather than dive into those, however, a glance at some yearly efficiency trends can be just as telling.

I also came across this very nice post, Measuring Clutch Play in the NBA, on the Inpredictable blog which offers an interesting and elegant alternative. In a nutshell, the idea is to look at how each player's actions impacted his team's probability of winning the game, referred to as Win Probability Added WPA. Made shots, rebounds, steals increase your team's probability, while missed shots and turnovers hurt it. Some adjustments are required to clean up the cumulative WPA for each player (essentially comparing the impact of the same play under normal circumstances), but it does at the end provide an intuitive metric that makes sense and allows quick comparisons.

I do however have some slight concerns with this metric. The first is that, unless I misread, the metric is cumulative, so that players with more minutes in the clutch have more opportunities to modify their team's WPA. The second is best illustrated with a small example: with a few seconds remaining, if a player makes a two-point shot with his team down by 2 or down by 1, it will make a huge impact on the WPA: in the first case they're tied, likely to go to overtime with 50/50% for each team to win the game, in the other case his team leads by 1 and have a good chance of winning the game. But is it fair to credit the player with very different WPA in both cases? What really matters is that, under tremendous pressure, the player made the shot.

This in turn leads to another question: what was the likelihood of that shot going in in the first place? How frequently does that player make that shot under normal circumstances without the game on the line? How frequently do other players make the shot? How much does clutch pressure reduce the average player's chance of making the shot, and was the player able to rise to the occasion and overcome the pressure?

According to Stephen Shea in his book Basketball Analytics, "90% of teams performed worse (in terms of shooting percentages) in the clutch than in non-clutch situations." Can this me modeled? How significant is the effect?

I will try to explore this path further, looking into statistical models that would offer some elements of response to these questions.

But looking at all hat has been said it seems the debate originates from the fact that "being clutch" is never well-defined. Suppose we could at any point during a game give a score from 0 to 100 as to how good a player is. Suppose player A is at 90 throughout non-clutch times, but drops to 80 in clutch situations. Whereas player B is at 60 in non-cutch situations, but steps his game up to 70 when the game is on the line. Which is clutchier? The one with highest absolute value, or the one stepping up his game and taking the pressure head-on. Answering this would already be a giant step in the right direction.

In the meantime, please enjoy this youtube compilation of clutch shots:

Tuesday, October 7, 2014

Are remakes in the producers' interests?

Two-bullet summary:

Similarly to sequels, remakes perform significantly worse than originals from a rating perspective
If you want to predict a remake's IMDB rating, a quick method is to multiply the original movie's rating by 0.84

In three previous posts (first, second, third) I looked at Hollywood's lack of creation and general risk-aversion by taking a closer look at the increasing number of sequels being produced despite the fact that their IMDB rating is significantly worse than the original installment.

We were able to confirm the expected result that sequels typically have worse ratings than the original (only 20% have a better rating), and the average rating drop is 0.9.

Those posts would not have been complete without looking at another obvious manifestation of limited creativity: remakes!

Before plunging into the data, a few comments:

finding the right data was instrumental. Because remakes don't necessarily have the same title or because movies may have the same title without being remakes, the data needed to be carefully selected. I finally settled for Wikipedia's mapping Wikipedia. I did however find some errors along the way so bear in mind that the data is not 100% accurate nor exhaustive;
one of the greatest drops in ratings was for Hitchcock's classic Psycho (there should be a law against attempting remakes of such classics!), with Gus Van Sant's version getting a 4.6, compared to Hitchcock's 8.6;
adapting Night of the Living Dead to 3D saw a drop from 8.0 to 3.1;
the best improvement for remake rating was for Reefer Madness, originally a 1936 propaganda on the dangers of Marijuana (3.6), but the tongue-in-cheek 2005 musical remake with Kristin Bell got a 6.7;
Ocean's Eleven was originally a 1960 movie with Frank Sinatra, but the exceptional casting for the version we all know with Damon, Clooney and Pitt led to a nice rating improvement (from 6.6 to 7.7)

Let's take a quick look at the data, comparing original rating to remake rating, the red line corresponds to y = x, meaning that any dot above the line corresponds to a remake that did better than the original, whereas anything under is when the original did better:

The distribution of the difference between remake and original is also quite telling:

The first obvious observation is that, as expected, remakes tend to do worse than the original movie. Only 14% do better (compared to 20% for sequels) and the average rating difference is -1.1 (compared to -0.9 for sequels).

The other observation is that the correlation is not as good as we had seen for sequels. This could make sense as in sequels many parameters are the same as for the original movie (actors, directors, writers). One reason parameters are much more similar for sequels than remakes is the timing between original and remake/sequel: 77% of sequels come less than 5 years after the original installment, whereas 50% of remakes come within 25 years! Parameters are more similar and your fan base has remained mostly intact.

From a more statistical point a view, a paired t-test allowed us to determine that the rating decrease of -1.1 was statistically significant at the 95% level (+/- 0.1).

In terms of modeling, a simple linear model gave us some insight for prediction purposes. In case you want to make some predictions to impress your friends, your best guess to estimate a remake's rating is to multiply the original movie's rating by 0.84.

The original Carrie movie from 1974 had a rating of 7.4, whereas the remake that just came out has a current rating of 6.5 (forecast would be 0.84 * 7.4 = 6.2). Given that movie ratings tend to drop a little after first few weeks of release, that's a pretty good forecast we had there! The stat purists will argue that this results is somewhat biased as Carrie was included in the original dataset...

Taking a step back, why does Hollywood continue making these movies despite anticipating a lower quality movie?

The answer is the same as for sequels: the risks are significantly reduced with remakes, you are almost guaranteed to bring back some fanatics of the original.

And less writers are required as the script is already there! However, it appears that sequels are a safer bet: the fan base is more guaranteed. As we previously saw, release dates are much closer for sequels and movies share many more characteristics.

Thursday, August 21, 2014

Originals and Remakes: Who's copying who?

In the previous post ("Are remakes in the producers' interests?"), I compared the IMDB rating of remake movies to that of the original movie. We found that in the very large majority of cases the rating was significantly lower.

One aspect I did not look at was who-copied-whom from a country perspective. Do certain countries export their originals really well to other countries? Do certain countries have little imagination and import remake ideas from oversees movies?

Using the exact same database as for the previous post (approx. 600 pairs of original-remake movies), I created a database which detailed for each country the following metrics:

number of original movies it created
number of remake movies it created
number of original movies it exported (country is listed for the original, not for the remake)
number of remake movies it imported (country is not listed for the original, but for the remake)

Top originals
Nothing very surprising with the US claiming the first place for original movies created, with 325 movies for which remakes were made. The next positions are more interesting, with France and India tied for second place with 36, closely followed by Japan with 30.

Top remakes
Again, nothing very surprising with the US claiming the first place again for number of remakes made, with 370. India is again in second position with 38, followed by UK (14) and Japan (10). Surprising to see France (6) in a distant position for this category given it's second place in the previous category.

Top exporters
Who manages to have their originals get picked up abroad the most? The US is in first position again with (49) with France in relatively very close second place with 32. Japan (21) and UK (14) are in third and fourth positions.

Top importers
Who are the biggest copiers? US is way ahead of everyone with 94, with multiple countries tied at 2 (France, UK, Japan...). Recall that UK, Japan and France were all among the top remake countries, the fact that they are low on the import list indicates that these countries tend to do their own in-house remakes instead of looking abroad for inspiration.

It is difficult to look at other metrics , especially in terms of ratios as many countries have 0 for either category. We could filter to only include countries that have at least 10 movies produced, or at least 5 imported and 5 exported, but even so we would be keeping only a handful movies.

France -> US Movie Relationship
France seemed to be an interesting example here given the high number of original movies produced, the fact that many of those were remade abroad and France's tendency of importing very little ideas. I therefore looked at the French-US relationship in matters of movies.
Wikipedia lists 24 original movies made in France for which a US remake was. In 22 cases the remake had a worst rating, and in the other two cases there was some improvement. 2 out of 24 is about 8.3%, somewhat worse than the overall effect for all original-remake pairs where we had seen improvement in 14% of the cases. Similarly, the average decline of 1.35 for IMDB rating is also somewhat worse than the average across all pairs which we had found to be around 1.1.
The worst remake is without a doubt Les Diaboliques, the French classic was Simone Signoret having a rating of 8.2 while the Sharon Stone remake had 5.1. And who would have thought that Arnold Schwarzennegger would have his name associated with the best remake improvement: his 1994 True Lies (7.2) was much better than the original 1991 La Totale! (6.1).

What about the reverse US -> France effect?
Well it turns out that France only made two remakes of US movies which leaves us with little observations for strong extrapolations. However, the huge surprise is that in both cases the French remake had a better IMDB rating than the original american version. 1978 Fingers had 6.9 while the 2005 The Beat that my Heart skipped is currently rated at 7.3. As for Irma la Douce, it jumped from 7.3 to 8.0.
It's hard, if not impossible, to determine if France is better at directing or whether they are better at pre-selecting the right scripts. What makes it even more head-scratching is the fact that out of the 6 remakes France did, the two US originals are the only ones where the remake did better. The other four were already French originals, and in all four cases the remake was worse.

This France-USA re-adaptation dynamics sheds some light as to why the French were extremely disappointed to hear about Danny Boon signing off rights to his record-breaking Bienvenue chez les Chti's to Will Smith for a Welcome to the Sticks remake. But as always, IMDB ratings are not the driving force at work here, and if Bienvenue chez les Chti's broke attendance and revenue records in France, it could prove to be a cash cow in the US without challenging being a big threat at the Oscars.

Should more cross-country adaptation be encouraged?
The France example should make us pause. 6 remakes. 4 original French movies, all rated worse when remade. 2 original US movies, all rated better when remade.
Is this a general trend?
I split the data in two, one subset where the country for the original and remake are the same (~80% of the data), and another subset where they are not (~20% of the data).
Here are the distributions for the rating difference between remake and original:

The two distributions are very similar, but it still seems that the rating drop is not as bad when the country of origin is the one making the remake than when another country takes the remake into its own hands.
Given the proximities in distributions, a quick two-sample t-test was performed on the means and the difference turns out to be borderline significant with a p-value of 0.0542.
Arguments could go both ways as to whether the remake would have higher rating if done by the same country or another one: movies can be very tied to the national culture and only that country would be able to translate the hidden cultural elements into the remake to make it successful. But one could argue that the same country would be tempted to do something too similar which would not appeal to the public. A foreign director might be inspired and want to bring a new twist to the storyline bring a different culture into the picture.

Looking back at France that does much better adapting foreign movies unlike the rest of the world, we have here witnessed another beautiful case of the French exception!