Thursday, June 20, 2013

Hollywood's lack of originality: "Let's make a sequel!" (Part 2, yes I realize the irony)

As indicated by the title, this is part 2 of the analysis of movie sequels. In the previous post I described the IMDB data used for the analysis and shared some preliminary statistics on the distribution of number of movie installments in movie series.

In this post I will focus on comparing IMDB ratings for the original movie and its sequel.

The next post will look at series with 3 or more installments.

Number 2

I will first focus on the second installment. Analysis of installments 3+ will be dealt with further down.

First of all, how much time goes by before the second installment comes out?

Here's a quick look at the distribution:

Yes, 37 years separate the two installments of The Wicker Man series. It's actually a trilogy in the works with the third installment due in 2014. The first one came out in 1973, and the second in 2010!

A couple of outsiders aside, the vast majority of sequels come soon after the original movie: in 77% of cases less than 5 years separate the two, in 92% of cases 10 years separate the two.

Now for the meat of the analysis: how does the second movie's rating relate to the first one's? As done in previous posts, I will focus on the IMDB rating. Even if there are potentially many biases with this metric (die-hard fans, foreign movies not rated as well as US movies...), I was hoping that by looking at differences in ratings between sequels most of these biases would cancel each other out.

The following graph plots the second installment's IMDB rating against the first installment's IMDB rating. Dots above the diagonal indicate sequels that did better, dots under indicate those that did worse.

As expected, sequels more likely exist for profit reasons than for creating all-time classics.

A few fun facts you can re-use at your next dinner party:
  • Only 19.7% of second installments did just as well (4.6%) or better (15.1%) than the first.
  • Some of the worst decreases in ratings go to The Mask (Jim Carrey's original 6.7 movie plummets to 2.1 with Son of the Mask which he wisely stayed away from) and The Exorcist (the original grandiose 1973 classic went from 8.1 to 3.6 in just four years).
  • One of the best increases in ratings goes to Captain America (the last installment that came out in 2011 with a rating of 6.8 is actually considered the sequel to the original 1990 movie that has a rating of 2.9).
  • On average, sequels have an IMDB score 0.9 less than the first movie.
Simple linear models can be run using the data; the main question we can ask ourselves is whether to include an intercept:
Sequel rating = alpha * Original rating (+ intercept)

Actually, a quick graph reveals that the intercept has little impact on the fit itself:

These models suggest that a better relationship than the average 0.9 decrease between the first two installments is that the sequel's rating is either 0.9 + 0.72 * Original Rating or simply 0.85 * Original Rating based on which model you prefer. In both cases we still conclude that the sequel usually does worse.

But does it do significantly worse? Significance of the decrease can be assessed with a paired t-test. Quick stat reminder, you absolutely want a paired t-test here. If you were to do a naive t-test between two independent populations you would reach the counter-intuitive conclusion that sequels do just as well because the between movie spread of movie ratings is much greater than the within movie spread. In other words, suppose all movies have a uniform rating between 1 and 10, and all sequels systematically have a rating 0.3 less making the sequel range go from 0.7 to 9.7. A t-test would be unable to pick up the systematic downward shift (except with huge sample size). A paired test in necessary because the two measurements are dependent (not simply because we want to measure significance!).

In our case, the paired t-test returns that the observed decrease in IMDB ratings is significant and actually very highly so.

In the next post we will look at series with three or more installments and view how ratings evolve for these longer series.

No comments:

Post a Comment