The Statisticator: January 2015

Thursday, January 29, 2015

One stat to rule them all? It would be a steal

It's been almost a year since Benjamin Morris wrote about The Hidden Value of the NBA Steal on fivethirtyeight.com, and a lot of criticism to say the least followed suit (two examples here and here).

The main criticism stemmed from the comment that "a steal is worth nine points", which caused many to throw their arms up in the air wondering how a player could all of a sudden score nine points in a single try without being in NBA Jam.

My purpose is not to review the original article, the criticisms nor review Morris's four part (!) response (kudos for tackling all the negative comments head-on). However, it is to be noted that since the steal article (Morris' third on fivethirtyeight after two others on basketball), Morris has primarily been tackling other sports than basketball (only 5 of 48, this is an advantage to writing this post so late after the fact).

Trying to take a step back, my attempt was to see how valuable indeed a steal is for measuring the value of a player. If I had to draft/trade for either a player who gets 25 points a game and 1 steal or one who has 16 points and 2 steals (to recycle Morris' example), who should I go for?

There is no perfect gold standard for summarizing a player into a single metric, although their are multiple options that get more and more sophisticated. ESPN reports RPM and WAR, defined on the site as:

RPM: Player's estimated on-court impact on team performance, measured in net point differential per 100 offensive and defensive possessions. RPM takes into account teammates, opponents and additional factors
WAR: The estimated number of team wins attributable to each player, based on RPM

So are steals are good proxy for a player's "value" assuming RPM and WAR are reliable value metrics?

I generated the following graphs linking steals per game with each of the two variables for the top 30 players in steals for the 2013-2014 season. The two graphs are extremely similar given the strong correlation between RPM and WAR.

I don't know about you, but I'm not seeing a strong correlation with steals.

This doesn't validate or invalidate Morris' analysis, but I thought it would be helpful to get some insight as to whether steals is really as powerful as the original paper would suggest.

I know I said I wouldn't comment on the back-and-forths between Morris and the critics, but one comment I had which I didn't see anywhere was around the fact that Morris seems to focus on steals per game, not my minute, not by possession. It's easier to get more steals if I play more minutes, and I might play more minutes if I'm a good player to start of with, so even if we had found a correlation it wouldn't have allowed us to reach any valuable conclusions.

Saturday, January 24, 2015

Unbe-klay-vable! (apologies, klay-verest pun I could think of)

Exactly nine years and two days ago, Kobe Bryant scored 81 points in a game.
Yesterday, Klay Thompson had a historical feat of his own: 52 points, which in itself is not jaw-dropping, but the way he recorded it was, thanks to 37 points in the third quarter alone (9/9 from 3-point land and 4/4 for 2-pointers).

If there ever was a definition of a player being hot, we witnessed it yesterday! Tracy McGrady did score 13 points in 35 seconds, but what Klay did is on another level.

But for the sake of some fun stat: What was the probability of Klay putting on this insane shooting display?

For the 2014-2015, Klay started the third quarter having attempted 272 3-pointers and made 120 (44.1%). He had also attempted 386 2-pointers and converted 188 of them (48.7%).

Assuming all his third quarter shots are independent of each other (very reasonable assumption, the only thing that could invalidate it is if there was such a thing as a "catching fire" effect), then the probability of Klay scoring a perfect 9-of-9 3-pointers and 4-of-4 2-pointers is 0.441^9 * 0.487^4 = 3.6e-5! Or 1 in almost 30,000. Basically, you would expect such a performance once every 342 seasons! Or Klay would have a higher probability of getting struck by lighting once in his life.

One question that this mind-boggling performance also raises is whether it has cemented Klay Thompson to be as inconsistent as ever? I am sure Nobel prize winner Daniel Kahneman will have some thoughts about this.

Even if you've seen the highlights countless times already, never hurts to review this once-in-342-seasons performance:

Tuesday, January 6, 2015

Fifteen seconds remaining, down by one....Who you gonna foul?

I was following the Mavericks - Kings game earlier this year. Exciting game which went into overtime. With 50 seconds left and the Kings down by two, Rajon Rondo fouled Jason Thompson and in the process sent him to the free throw line. He made the first shot. But before he could attempt the second, the Mavs called a timeout. After the timeout, Jason went back to shoot his second free throw, and missed it.

Did the timeout have any impact on the missed shot? You couldn't make a free throw any more straightforward. Unlike a penalty kick at soccer, there's nothing your opponents can do to alter that shot. Except call a timeout? Legendary coach Phil Jackson was (in)famous for calling timeouts between opponents free throws, but was this ploy effective at all? Thompson could have tied the game on his second free throw attempt, which would have shifted a considerable amount of pressure of his and his teammates' shoulders with less than a minute to go. With the timeout called, Thompson was left there brewing in these thoughts with mounting pressure.

That game made me want to investigate the timeout phenomenon, as well as other external factors that could influence the outcome of a free throw. I also wanted to follow up on an earlier post I made about measuring players' clutch performances via statistical models.

A few words on the data before jumping into the analysis. I focused on the most recent complete NBA season: 2013-2014. I pulled all the play-by-play data from nba.com, and pulled free throw season percentages for each player from espn.
Quite a bit of cleaning up was required, namely around players with same last name and same team the worst example being the Morris twins in Phoenix who also share same initial!
After cleaning everything up, I was left with just under 56K free throws taken in that season, ready to be analyzed!

I was primarily interested in the impact of free throws interrupted by timeouts, but also wanted to capture two additional factors: whether the shooter has homecourt advantage or not, and whether the situation is "clutch". There are countless definitions of "clutch time" available, some sparking heated debates. I have here defined it as "less than 2 minutes to play in the 4th quarter or in overtime, and less than 5 point differential between the teams' scores".

Before jumping into the data and analysis, let's first do some visual explorations.

How many free throws are taken by quarter?

Not surprisingly, significantly more free throws are taken in the fourth quarter than the first. The game is on the line, the defense goes up a notch, and voluntary fouls are committed to regain ball possession and prevent the opponent from running down the clock.

We can even go down one granularity level at look at the number of free throws made by minute played. Rather impressive to visualize the steady increase throughout each quarter, and the giant spike in the final minute of regulation with teams fouling on purpose in tight games.

We've looked at volume, let's know look at efficiency. How well do the home and road teams shoot the ball?

It appears that both teams shoot at very similar rates throughout the contest, with the home team always having an advantage although it is not a significant as one might have expected given the distractions often displayed by the home fans.

Both teams seem to do better in overtime, but we need to caution against the much smaller sample size there.

And now to the more interesting piece, how do teams execute in clutch time?

Quite surprisingly, the home team appears to be performing no differently, whereas the road team gets a nice boost of almost 5%. The fact that we observe a boost might seem counterintuitive for some: under pressure, with fatigue from close to 48 minutes of gameplay, wouldn't it be more difficult to concentrate and sink the shot? However, a reverse argument could be made that especially when games are close, or when a team expects the other one to intentionally foul, the coach might chose to place his best shooters on the floor. So teams aren't necessarily shooting better, just having better shooters take the shots. This however does not fully explain why the road team has a boost and not the home team.

The following graph shows the 1st quantile, median and third quantile for season free throw percentage of the players taking shots in and out of clutch. It is rather apparent that better shooters are on the floor in clutch moments.

Now that we have a better feel for the data, the analysis can begin. The data is extremely rich and offers multiple options from a statistical analysis point of view. We can leave the baseline free throw shooting percentages for each player be determined by the model fitting, or force these to be the players' season averages. But with different players taking a very different number of free throws within a season, and strong dependency in the success of a free throw for all those taken by the same player, a hierarchical structure emerges and a mixed effects model could make sense.

I actually played around with the three options just mentioned, and was satisfied at how close the numerical outputs were to each other.

The conclusions would indicate that:

homecourt does have a positive effect on shooters' success, although the effect was only borderline significant
calling a timeout before the second (or third) free throw had a negative but insignificant impact
clutch time had a negative and significant impact

Regarding timeouts, the fact that the effect was not significant could be due to the low sample size of these events (84 cases in 2013-2014 out of 56K free throws taken), more coaches should test this strategy so I can tell them if it's effective or not!

As for clutch time, the conclusion seems to contradict the visual exploration where percentages were higher in clutch time. But recall that our explanation to this was that the coaches were putting better shooters on the court. The analysis would indicate that even if the best shooters are on the floor in the closing minutes, they are individually performing less well when the game is on the line than in the middle of the second quarter.

Now one might wonder if we could use the data to detect some of the leagues best clutch free throw shooters. Those cold-blooded killers who can step it up an extra notch when all eyes are on them. The Durants, James, Bryants...

I added some interaction terms for players with sufficient (20) in and out of clutch time free throws and see which ones had the potential to elevate their game. And the results are... no one! Out of the 26 players meeting my criteria, none could significantly increase their free throw percentage. This could again be due to small sample size, but even so most players had negative coefficients. While none had significant positive coefficients, two had significant negative coefficients: Chris Paul and Ramon Sessions.

So back to the post's title, if you're playing the Clippers, fifteen seconds to go and down by one, do you foul Chris Paul?