One night in 1996, a twenty-three year-old Larry Page attempted to work out a way of downloading the Internet.
My objective the past few months was slightly less ambitious (and for that reason will probably not make me a multi-billionaire 20 years from now). I decided to download IMDB.
If you like numbers and stats (and movies!), IMDB is rather awesome and has the potential for endless statistical analyses. I already wrote a post a while back on the collaboration between Johnny Depp and Tim Burton and attempted to answer who had benefited the most of the collaboration. I have many other analyses in mind, but before looking at those, I thought I would first take a step back and look at the evolution of the movie industry almost from the very beginning.
I would first like to point out that as exhaustive as I would like my analyses to be, there are limitations. First of all, there most likely is a bias as to which movies make it in IMDB or not. There is also a bias as to who rates the movies and how. I watch quite a few movies and systematically notice that almost all French movies (even the really good ones) get pretty bad scores. Now that I think about it, this could be an analysis in itself!
The first thing I looked at was the distribution of ratings over time (gray boxplots) as well as the number of movies (red line) since the end of the 1800s.
From a size perspective, it is striking both how the number of movies per year has exploded hockey-stick style starting but also the abruptness of the saturation around 2005. There is a possibility that I was not able to pull all movies for the later years (2012 data seems particularly suspicious), but there still remains rather strong evidence that in a period of 15/20 years we have reached saturation in the number of movies coming out at around 10K a year. You'd still have to watch 28 a day if you wanted to see them all!
As for the distribution of IMDB scores, the median (black dot in the middle of the gray boxes) has remained remarkably stable over time, hovering around 6. The spread has increased slightly as well as the extremes but this is also a consequence of the increase in the number of movies produced: the more that come out, the more likely of getting very bad ones as well as very good ones!
Let's now take a quick look at the proportion of movies of the different genres over time. Again, IMDB is a little tricky here as a movie can have multiple genres. I here associated each movie to its primary genre. I did run the analysis where I took multiple genres into account (so that the original Star Wars was triple-counted as Action, Adventure and Fantasy). It turns out that the results were extremely similar. Out of the 28 unique genres, I have only represented those with sufficient fractions and with the most interesting trends.
The most remarkable trend is that of short movies. While almost 50% of the 1910 movies were short features, that number stabilized around 5% for almost 40 years until the 1990s, before increasing suddenly to almost 1 out of 3 movies nowadays. The trend for Documentaries has been very similarly, but stabilized at about 1 in 6 movies in the recent years.
Westerns were popular back in the days, but are almost unheard of as of 1970s. Instead the 1970s mark the simultaneous 30 year long golden age of both Action and Adult movies. I have to admit that I was surprised in the decline in Adult movies since the late twentieth century, as I figured the oldest job in the world would be continue to be the oldest inspiration of the world. Might be worth further investigation in another post (I do expect quite a few hits on that one...).
Here are some other interesting findings I made along the way:
War movies: Popular during war times
While most genres have trends similar to the one described for the first figure, with an explosion in number of movies around 1990 that has since then stabilized, War movies displays a nice exception:
War movies come during and after wars: loo at World War 2, Vietnam, a small spike around the first Gulf War, and a big surge right after 2001.
Horror movies: Stop making theses!!!
In terms of IMDB ratings, most genres follow the general trend of a long-term stabilization around 6. Horror movies have quite a different story to tell:
Although the number of Horror movies being releases follows the same S-curve we've seen for the general trend, ratings for Horror have essentially dropped continuously since the 1920s!
Film what?
While almost all genres still exist today (and actually released at an increased pace), the Film Noir genre is quite particular:
Its golden age period lasted less then 20 years around the 1950s, and no Film Noirs have been made since then. Quite unfortunate when we see that certain gems like "The Third Man" with Orson Welles fall in this category!
Rated SM R for super mature
We observed earlier some surprising tendencies in the Adult movie category, here's a deep dive.. I mean a closer look:
I have some doubts on the reliability of these numbers, and don't think that the number of Adult Movies released per year in less than 500 (especially given that the estimated revenue from videos is estimated to be in the $0.5Billion - $1.8Billion range ! It really depends on whether IMDB performs some filtering as to which movies get added to the database.
That being said, even if the absolute number of movies released is biased, the distribution of the ratings is more trustworthy, and the "n" shape is quite interesting: ratings seem to have saturated at around 6.5 but then suddenly plummeted over the past few years.
So that was a quick overview of what can be pulled from IMDB. As mentioned at the beginning of the post, expect many more analyses to follow!