Wednesday, May 29, 2013

If you're the San Antonio Spurs...

I don't know about you, but I've found these NBA Playoffs to be more exciting than the last editions. Aside from Indiana-Atlanta (no offense), all first round match-ups had some backstory to them.
It got even better in the second round, the Bulls shocking the basketball world by stealing Game 1 against Miami, with Stephen Curry catching fire NBA-Jam-style in San Antonio. And then we get to conference finals, with buzzer-beaters to win games, crazy rallies and every other game going to overtime. The drawback of course is that I cut my life expectancy by three years due to the adrenaline rushs.

Okay, enough of the NBA advertising (I swear I'm not getting a dime from David Stern), what do the stats whisper as of today May 29th? Well, Spurs have some waiting to do until they figure out who their Eastern opponent is going to be...

Indiana or Miami in the Finals??

Tough choice if you've watched the last games, but the stats are oblivious to feelings, only hard facts matter, and the Heat remain a strong favorite, with almost 66% probability. Here's the probability breakout:

Winner Number of Games Probability
Miami 6 32.5%
Indiana 6 17.3%
Miami 7 33.1%
Indiana 7 17.1%

Spurs' 5th trophy?

As of today, here are the each team's probability of bringing the Larry O'Brien trophy home:

NBA team Champion Probability
SAS 47.5%
MIA 39.6%
IND 12.9%

If Miami wins the series against Indiana, San Antonio's probability of winning it all drops to 39.8%:

Winner prob
San Antonio 4 4.4%
Miami 4 7.8%
San Antonio 5 8.2%
Miami 5 17.6%
San Antonio 6 15.2%
Miami 6 15.5%
San Antonio 7 12.0%
Miami 7 19.3%

If Indiana wins the series against Miami, San Antonio's probability surges back to 62.5%:

Winner prob
San Antonio 4 7.9%
Indiana 4 4.0%
San Antonio 5 18.9%
Indiana 5 7.3%
San Antonio 6 15.2%
Indiana 6 15.2%
San Antonio 7 20.4%
Indiana 7 11.1%

The big probability swing is of course due to the historical performance of both teams over the regular season and playoffs but also to the change in home court advantage: San Antonio has homecourt against Indiana, not against Miami.

So while San Antonio will not officially voice any preferences (and might even suggest Miami 'cause you're only considered the best if you beat the best), I can't help but feel that deep-down they got to hope for the unexperienced Pacers to end up against them on basketball greatest stage.

Sunday, May 19, 2013

Hollywood movie trends

One night in 1996, a twenty-three year-old Larry Page attempted to work out a way of downloading the Internet.

My objective the past few months was slightly less ambitious (and for that reason will probably not make me a multi-billionaire 20 years from now). I decided to download IMDB.

If you like numbers and stats (and movies!), IMDB is rather awesome and has the potential for endless statistical analyses. I already wrote a post a while back on the collaboration between Johnny Depp and Tim Burton and attempted to answer who had benefited the most of the collaboration. I have many other analyses in mind, but before looking at those, I thought I would first take a step back and look at the evolution of the movie industry almost from the very beginning.

I would first like to point out that as exhaustive as I would like my analyses to be, there are limitations. First of all, there most likely is a bias as to which movies make it in IMDB or not. There is also a bias as to who rates the movies and how. I watch quite a few movies and systematically notice that almost all French movies (even the really good ones) get pretty bad scores. Now that I think about it, this could be an analysis in itself!

The first thing I looked at was the distribution of ratings over time (gray boxplots) as well as the number of movies (red line) since the end of the 1800s.

From a size perspective, it is striking both how the number of movies per year has exploded hockey-stick style starting but also the abruptness of the saturation around 2005. There is a possibility that I was not able to pull all movies for the later years (2012 data seems particularly suspicious), but there still remains rather strong evidence that in a period of 15/20 years we have reached saturation in the number of movies coming out at around 10K a year. You'd still have to watch 28 a day if you wanted to see them all!

As for the distribution of IMDB scores, the median (black dot in the middle of the gray boxes) has remained remarkably stable over time, hovering around 6. The spread has increased slightly as well as the extremes but this is also a consequence of the increase in the number of movies produced: the more that come out, the more likely of getting very bad ones as well as very good ones!

Let's now take a quick look at the proportion of movies of the different genres over time. Again, IMDB is a little tricky here as a movie can have multiple genres. I here associated each movie to its primary genre. I did run the analysis where I took multiple genres into account (so that the original Star Wars was triple-counted as Action, Adventure and Fantasy). It turns out that the results were extremely similar. Out of the 28 unique genres, I have only represented those with sufficient fractions and with the most interesting trends.

The most remarkable trend is that of short movies. While almost 50% of the 1910 movies were short features, that number stabilized around 5% for almost 40 years until the 1990s, before increasing suddenly to almost 1 out of 3 movies nowadays. The trend for Documentaries has been very similarly, but stabilized at about 1 in 6 movies in the recent years.

Westerns were popular back in the days, but are almost unheard of as of 1970s. Instead the 1970s mark the simultaneous 30 year long golden age of both Action and Adult movies. I have to admit that I was surprised in the decline in Adult movies since the late twentieth century, as I figured the oldest job in the world would be continue to be the oldest inspiration of the world. Might be worth further investigation in another post (I do expect quite a few hits on that one...).

Here are some other interesting findings I made along the way:

War movies: Popular during war times

While most genres have trends similar to the one described for the first figure, with an explosion in number of movies around 1990 that has since then stabilized, War movies displays a nice exception:

War movies come during and after wars: loo at World War 2, Vietnam, a small spike around the first Gulf War, and a big surge right after 2001.

Horror movies: Stop making theses!!!

In terms of IMDB ratings, most genres follow the general trend of a long-term stabilization around 6. Horror movies have quite a different story to tell:

Although the number of Horror movies being releases follows the same S-curve we've seen for the general trend, ratings for Horror have essentially dropped continuously since the 1920s!

Film what?

While almost all genres still exist today (and actually released at an increased pace), the Film Noir genre is quite particular:
Its golden age period lasted less then 20 years around the 1950s, and no Film Noirs have been made since then. Quite unfortunate when we see that certain gems like "The Third Man" with Orson Welles fall in this category! 

Rated SM R for super mature

We observed earlier some surprising tendencies in the Adult movie category, here's a deep dive.. I mean a closer look:

I have some doubts on the reliability of these numbers, and don't think that the number of Adult Movies released per year in less than 500 (especially given that the estimated revenue from videos is estimated to be in the $0.5Billion - $1.8Billion range ! It really depends on whether IMDB performs some filtering as to which movies get added to the database.
That being said, even if the absolute number of movies released is biased, the distribution of the ratings is more trustworthy, and the "n" shape is quite interesting: ratings seem to have saturated at around 6.5 but then suddenly plummeted over the past few years.

So that was a quick overview of what can be pulled from IMDB. As mentioned at the beginning of the post, expect many more analyses to follow!