Showing posts with label victory. Show all posts
Showing posts with label victory. Show all posts

Thursday, June 21, 2012

Homecourt and rest time advantage

In a previous post, I looked at the true impact of homecourt advantage in the NBA, for the league in general and for each individual team. The model was simple, only considering whether the game was at home or away.

The main take-away was that playing at home bumped your probability of winning by almost 20 percentage points, from 40% to 60%. Quite a significant jump, although not every team observed the same jump.

I did however feel that the model was a little over-simplistic in ignoring another phenomenon which could impact a team's performance: rest time between games. Especially over the 2011-2012 condensed season with certain teams playing back-to-back-to-back games, one can definitely wonder how rest days come into play. If a team is playing on the road, can the fact that they have had three days of rest as opposed to their opponents back-to-back games mitigate the opponent's home court advantage?

The data and methodology are almost identical to the post I mentioned earlier: I looked at all 2009-201 and 2010-2011 games, and for each match-up looked at which team played at home and how many days of rest each team had.

Since we are now looking at multiple variables instead of just the homecourt impact, I will only provide the breakdown of results for the league in general, providing them for each team would just take up too much space.


Impact of rest days

The following table provides the victory probability based on where the game is played and the number of rest days for both teams.

Team A at home Rest days (Team A) Rest days (Team B) Win probability
Yes 1 1 59.1%
No 1 1 40.9%
Yes 2 1 65.2%
No 2 1 47.4%
Yes 3+ 1 62.6%
No 3+ 1 44.6%
Yes 1 2 52.6%
No 1 2 34.8%
Yes 2 2 59.1%
No 2 2 40.9%
Yes 3+ 2 56.4%
No 3+ 2 38.3%
Yes 1 3+ 55.4%
No 1 3+ 37.4%
Yes 2 3+ 61.7%
No 2 3+ 43.6%
Yes 3+ 3+ 59.1%
No 3+ 3+ 40.9%


Some interesting highlights are that:
  • independently of the number of rest days each team has had the difference homecourt advantage is always around 17-18%
  • the homecourt effect is much more predominant than the number of rest days: even in the best case scenario, the win probability on the road is 47.4%, so essentially a +7% percentage uplift due to rest days, as opposed to the +20% we saw in the previous post for the homecourt advantage impact.
  • it turns out that resting 2 days improves probability of victory compared to one day only, and three or more days is also more beneficial than one day only, two days is actually preferable to 3 or more days. This is also a debate that comes around often especially during playoff time, where one team comes out of a game 7 to meet a team that finished a sweep over a week before. Is too much rest a bad thing? From this data it does appear that 2 days provides the optimal balance between hitting your stride while you're hot and resting your sore legs.

Team's optimal rest days

What is true for the league isn't necessarily true for individual teams. I wanted to check if all teams preferred to rest 2 days instead of 1 or 3+ days. Were younger teams eager to have back-to-back games? Were older teams dreadful of tight schedules?

Team Significant Optimal rest days
NBA Yes 2
ATL Yes 2
BOS No 3
CHA Yes 2
CHI Yes 2
CLE No 2
DAL No 3
DEN Yes 3
DET Yes 3
GSW Yes 1
HOU No 2
IND Yes 2
LAC Yes 2
LAL Yes 1
MEM Yes 2
MIA No 2
MIL Yes 1
MIN Yes 1
NJN Yes 2
NOH Yes 3
NYK No 3
OKC No 3
ORL No 2
PHI No 2
PHO Yes 3
POR Yes 1
SAC No 1
SAS No 2
TOR Yes 3
UTA No 3
WAS Yes 3


Upon close inspection there does not seem to be any strong correlation between the team's age and the preferred number of rest days. Sure Boston is an old team preferring over three days and Golden State is one of the youngest team performing best on back-to-back games, but the Lakers are an old team also preferring back-to-back teams and the Wizards are a young team with best odds after 3+ days of rest.

To conclude, while rest days do influence performance in different ways for different teams, homecourt advantage remains the most impactful variable for outcome prediction of a game.



Tuesday, May 15, 2012

2012 NBA Playoffs: Updated forecasts


What a first round this has been!

Things were rather quickly expedited in the East, including the surprising elimination of the #1 team Chicago Bulls, surprising until we saw the following video at least:




Meanwhile, the West was really the wild wild west and gave us some thrilling comebacks and two stressful game sevens.

Chicago was the favorite to win the Championship after the first two games of the playoffs with an estimated probability of victory of 17.9%. Its elimination has freed up some room but for whom?

Oklahoma City, San Antonio and Miami were the runner ups, and while the names of the next three teams hasn't changed, their order has:

NBA teamChampion Probability
MIA21.5%
OKC21.2%
SAS19.3%
BOS10.2%
LAC7.9%
IND7.3%
PHI6.6%
LAL6%


However the results are slightly biased as of now in the sense that Miami and Oklahoma won their round 2 opener whereas San Antonio still hasn't played Game 1 against the Clippers. If it were to win, it would jump right back to the first spot with a probability of 24.6% of clinching the Larry O'Brien trophy, more than 3 percentage points ahead of Miami and Oklahoma City.

More updates at the end of round 2!

Monday, April 23, 2012

NBA player rankings


So in addition to boardgames, I am also a big NBA fan, and luckily NBA and stats mix really well.
A topic that has often been covered is how to rank players? Which player has the greatest impact? Which is the greatest player of all time (well, that one's easy ;-) )?


The debates rage because the questions are so vague and open to interpretation. What does it mean for a player to be a better basketball player than another? Does it mean having better stats? If I score more, rebound more, assist more, steal more, turnover less, clearly I am better than you? Another approach is to compare win/loss records when the player plays or sits out, although for most players it will be difficult to have a good sample size for the "sitting out" observations. A new metric that has emerged and solves this "sitting out" problem is the plus/minus statistic, which keeps track of the score before and after a player enters the game. So say a player enters the game with game tied at 10, and leaves it with his team ahead by 5. That's +5 for him. He re-enters the game with his team ahead by 10, and leaves it (without coming back) with his team only ahead by 2. That's -8. With the earlier +5 that's an overall -3 for that player in that game.

Today I wanted to look at a different approach, a more statistical approach. I have no clue where it will lead me, but after different trials and errors and tweaking here and there I hope to come up with a new interesting way to rank players.

Ultimately, what we care most about is wins. Sure it's great to score 100 in a game, but if you lose that game that's just wasted effort. So the idea is to find a relationship between a player's efforts and the impact it has on the game. In other words, how does a player's stats in a game change the probability of winning the game?


In terms of the data, I looked at the past six seasons (not including 2011-2012), and for each player looked at his stats with the game outcome for all games played. I only considered players with at least 50 wins and 50 losses, playoffs not included.

As our variable of interest is a probability (of winning the game), we naturally turn towards a logistic regression. We are not directly modeling the probability as a linear combination of the covariates but rather the log odds: log(P(win) / (1 - P(win))). The interpretation of the coefficients will not be entirely straightforward but will still allow us to rank players. Which player has the greatest coefficient, and has the greatest impact on the log odds and thus the probability of winning the game?

Well it depends on our covariates. Since we do want to find an easy way to rank, it's best to only consider one covariate.

Points

Let's naively only consider points scored. How does scoring an extra point improve the log odds?
The top 5 impactful players are (in order): Calvin Booth, Greg Ostertag, Antonio Davis, Anderson Varejao and Bruce Bowen.

All metrics

Points might be too restrictive, since a player can have an impact without scoring. So let's consider (points + rebounds + steals + assists + blocks - turnovers), referred from here onwards as "all metrics" as the covariate.
The top 5 impactful players are (in order): Bruce Bowen, Calvin Booth, Eddie Griffin, Kevin Durant, Antonio Davis.

Minutes played

If we were to suspect that a player's impact is difficult to track with simple metrics only, let us take minutes played as a proxy for everything observed and not observed (good defense, good picks...)
In this last case, the top 5 impactful players are (in order): Kevin Durant, Othella Harrington, Gerald Wallace, Eddie Griffin, Zach Randolph.

Where are the superstars?

It's interesting to see the same names come up, and to notice that aside from Kevin Durant, none of the players have superstar status.

Talking of superstars, where are they in the rankings?

Out of the 479 players considered, here is how some superstars ranked respectively for points, all metrics, and minutes played:
Kobe Bryant: 363, 333, 479
LeBron James: 247, 101, 475
Kevin Garnett: 396, 416, 478

Wow.

Caveats

As I was mentioning, I am discovering these results in almost real time with you, and still a little unclear how to interpret them myself. There are a lot of things that could hurt the analysis, namely the fact that the coefficients are hear interpreted as "change in log odds for an additional unit increase in the covariate". But an additional point for Kobe isn't exactly the same thing as an additional point for Eddie Griffin.

We also have a case pointed out in "Superfreakonomics" about ranking good surgeons. Looking at patient death rate for instance can be misleading because of selection bais. People with more critical conditions will go see the better surgeon but because of there condition increase the risk of increasing the surgeons death rate because of the very critical condition. Bad doctors only seeing healthy patients will have impeccable track records. Similarly in basketball, it could be argued that when the game is on the line you will go to your superstars that will have to play exceptionally well to win the game, whereas you might put all your bench in the game when the game has already been won for a while.

There is definitely room for improvement, but I will continue to explore this approach to try to identify lesser known players that have strong yet unnoticeable impacts on the game.
Stay tuned!