The Statisticator: June 2013

Thursday, June 20, 2013

Hollywood's lack of originality: "Let's make a sequel!" (Part 2, yes I realize the irony)

As indicated by the title, this is part 2 of the analysis of movie sequels. In the previous post I described the IMDB data used for the analysis and shared some preliminary statistics on the distribution of number of movie installments in movie series.

In this post I will focus on comparing IMDB ratings for the original movie and its sequel.

The next post will look at series with 3 or more installments.

Number 2

I will first focus on the second installment. Analysis of installments 3+ will be dealt with further down.

First of all, how much time goes by before the second installment comes out?

Here's a quick look at the distribution:

Yes, 37 years separate the two installments of The Wicker Man series. It's actually a trilogy in the works with the third installment due in 2014. The first one came out in 1973, and the second in 2010!

A couple of outsiders aside, the vast majority of sequels come soon after the original movie: in 77% of cases less than 5 years separate the two, in 92% of cases 10 years separate the two.

Now for the meat of the analysis: how does the second movie's rating relate to the first one's? As done in previous posts, I will focus on the IMDB rating. Even if there are potentially many biases with this metric (die-hard fans, foreign movies not rated as well as US movies...), I was hoping that by looking at differences in ratings between sequels most of these biases would cancel each other out.

The following graph plots the second installment's IMDB rating against the first installment's IMDB rating. Dots above the diagonal indicate sequels that did better, dots under indicate those that did worse.

As expected, sequels more likely exist for profit reasons than for creating all-time classics.

A few fun facts you can re-use at your next dinner party:

Only 19.7% of second installments did just as well (4.6%) or better (15.1%) than the first.
Some of the worst decreases in ratings go to The Mask (Jim Carrey's original 6.7 movie plummets to 2.1 with Son of the Mask which he wisely stayed away from) and The Exorcist (the original grandiose 1973 classic went from 8.1 to 3.6 in just four years).
One of the best increases in ratings goes to Captain America (the last installment that came out in 2011 with a rating of 6.8 is actually considered the sequel to the original 1990 movie that has a rating of 2.9).
On average, sequels have an IMDB score 0.9 less than the first movie.

Simple linear models can be run using the data; the main question we can ask ourselves is whether to include an intercept:
Sequel rating = alpha * Original rating (+ intercept)

Actually, a quick graph reveals that the intercept has little impact on the fit itself:

These models suggest that a better relationship than the average 0.9 decrease between the first two installments is that the sequel's rating is either 0.9 + 0.72 * Original Rating or simply 0.85 * Original Rating based on which model you prefer. In both cases we still conclude that the sequel usually does worse.

But does it do significantly worse? Significance of the decrease can be assessed with a paired t-test. Quick stat reminder, you absolutely want a paired t-test here. If you were to do a naive t-test between two independent populations you would reach the counter-intuitive conclusion that sequels do just as well because the between movie spread of movie ratings is much greater than the within movie spread. In other words, suppose all movies have a uniform rating between 1 and 10, and all sequels systematically have a rating 0.3 less making the sequel range go from 0.7 to 9.7. A t-test would be unable to pick up the systematic downward shift (except with huge sample size). A paired test in necessary because the two measurements are dependent (not simply because we want to measure significance!).

In our case, the paired t-test returns that the observed decrease in IMDB ratings is significant and actually very highly so.

In the next post we will look at series with three or more installments and view how ratings evolve for these longer series.

Saturday, June 15, 2013

Hollywood's lack of originality: "Let's make a sequel!" (Part 1, yes I realize the irony)

Terminator 2, Rocky 3, Alien 4, Scary Movie 5, Fast and Furious 6... When does it stop?

When a first movie works well, we systematically expect a sequel to be released.

This can get quite annoying for the majority of movie watchers who are not part of the aficionados who will see the sixth installment of a series they did not care much about after the first installment. While making a follow-up movie sounds like easy money, coming up with a good movie is very hard when you think about it. Of course you potentially have good characters with whom the audience connected well in the first movie. But after that you no longer have the element of surprise, you don't have all the interesting scenes that introduce the characters, you have to go completely off the roof to surpass the intrigue, suspense, action of the first movie. And this often fails. Everybody "knows" that ratings for follow-ups are worse than for the original, but how often is that actually true? If they were always significantly worse, would producers continue to produce them? Are they significantly worse or just a tad?

In this and following posts I want to take a second look (!) at movie sequels.

Data

A few words on the data. I pulled the list of movie series from wikipedia, at the following links: http://en.wikipedia.org/wiki/List_of_film_series_with_[one, two, three...]_entries

The lists were not 100% accurate, but I figured it did a rather decent job.

Then I merged all the series with my separately downloaded IMDB data. Again, not 100% perfect in the matching and merging procedure, and there were some discrepancies in title names and release dates, but overall I was able to keep drop-outs to a minimum.

Some clean-up was then executed, removing movies that came out straight to video or TV, incomplete series. Sequels are not a recent phenomena, as can be concluded from the four installment series of the Wizard of Oz in 1910, or even the six installment silent series of Sherlock Holmes from 1908 to 1910! However, I did not want to go too far back in time with the many biases around old movies involved and wanted to focus on the more recent "sequel effect". I thus looked only at series where the first installment was released after 1970.

This still left me with a little over 600 series and over 1500 different movies. Not too shabby to get a pretty decent idea of the sequel dynamics!

Series length

Before anything, let us take a quick look at the current status of series lengths. More than the actual numbers themselves it is primarily the distribution of series length that interest us. How many series stop after the second installment? How many go on to make an 8th? While pulling the data I arbitrarily cut-off at 10 so super long series such as James Bonds are not accounted for here.

As one might have expected, series are usually quite short, and series with exactly two installments account for over half of all series (53%) and series with exactly three installments account for another 25%.

In the next post, we will look at sequel ratings, and whether there is indeed a drop-off compared to the original movie.
In the post following that we will look at series with three or more installments and view how ratings evolve for these longer series.