Saturday, December 5, 2015

Is this movie any good? Figuring out which movie rating to trust

In October, FiveThirtyEight published a post cautioning us when looking at movie reviews "Be Suspicious Of Online Movie Ratings, Especially Fandango’s".

To summarize the post as concisely as possible, Walt Hickey describes how Fandango inflates movie scores by rounding up, providing the example of Ted 2 which, despite having an actual score of 4.1 in the page source, displays 4 and a half stars on the actual page. The trend is true across all movies, a movie never had a lower score displayed, and in almost 50% of cases displayed a higher score than expected.

So if Fandango ratings aren't reliable, it could be worth turning to another major source for movie ratings: the Internet Movie Database IMDB.

Pulling all IMDB data, I identified just over 11K movies that had both an IMDB rating (provided by logged-in users) and a Metascore (provided by, aggregating reviews from top critics and publications). The corresponding scatterplot indicates a strong correlation between the two:

In these instances, it is always amusing to deep-dive into some of the most extreme outliers.
To allow for fair comparisons across both scales (0-100 for Metascore, 0-10 for IMDB), we mapped both scales to the full extent of a 0-100 scale, by subtracting the minimum value, dividing by the observed range, and multiplying by 100.
So if all IMDB ratings are between 1 and 9, a movie of score 7 will be mapped to 75 ((7 - 1) / (9 - 1) * 100).

Here are the movies with highest discrepancies in favor of Metascore:
Movie Title IMDB Rating Metascore Genre IMDB (norm) Metascore (norm) Delta
Justin Bieber: Never Say Never 1.6 52 Documentary,Music 7.2 51.5 -44.3
Hannah Montana, Miley Cyrus: Best of Both Worlds Concert 2.3 59 Documentary,Music 15.7 58.6 -42.9
Justifiable Homicide 2.9 62 Documentary 22.9 61.6 -38.7
G.I. Jesus 2.5 57 Drama,Fantasy 18.1 56.6 -38.5
How She Move 3.2 63 Drama 26.5 62.6 -36.1
Quattro Noza 3.0 60 Action,Drama 24.1 59.6 -35.5
Jonas Brothers: The 3D Concert Experience 2.1 45 Documentary,Music 13.3 44.4 -31.2
Justin Bieber's Believe 1.6 39 Documentary,Music 7.2 38.4 -31.2
Sol LeWitt 5.3 83 Biography,Documentary 51.8 82.8 -31.0
La folie Almayer 6.2 92 Drama 62.7 91.9 -29.3

And here are the movies with highest discrepancies in favor of IMDB's rating:
Movie Title IMDB Rating Metascore Genre IMDB (norm) Metascore (norm) Delta
To Age or Not to Age 7.5 15 Documentary 78.3 14.1 64.2
Among Ravens 6.8 8 Comedy,Drama 69.9 7.1 62.8
Speciesism: The Movie 8.2 27 Documentary 86.7 26.3 60.5
Miriam 7.2 18 Drama 74.7 17.2 57.5
To Save a Life 7.2 19 Drama 74.7 18.2 56.5
Followers 6.9 16 Drama 71.1 15.2 55.9
Walter: Lessons from the World's Oldest People 6.4 11 Documentary,Biography 65.1 10.1 55.0
Red Hook Black 6.5 13 Drama 66.3 12.1 54.1
The Culture High 8.5 37 Documentary,News 90.4 36.4 54.0
Burzynski 7.4 24 Documentary 77.1 23.2 53.9

Quickly glancing through the table we see that documentaries are often the cause of the biggest discrepancies one way or another. IMDB is extremely harsh with music-related documentaries, Justin Bieber has two "movies" in the top 10 discrepancies. Further deep-diving in those movies surfaces some interesting insights: For those movies, approximately 95% of voters gave it either a 0 or 10, and the remaining 5% a score between 1-9. Talk about "you either love him or hate him!". Breaking votes by age and gender, provides some additional background for the rating: independently of age category, no male group gave Justin a higher than 1.8 average, whereas no female group gave less than a 2.5. As expected, females under 18 gave the highest average score 5.3 and 7.8 respectively for each movie, but with 4 to 5 times more male than female voters, the abysmal overall scores were unavoidable.

OK, I probably spent WAY more time than I would have liked discussing Justin Bieber movies...

Let's look at how these discrepancies vary against different dimensions. The first one that was heavily suggested from the previous extremes was naturally genre:

We do see that Documentaries have the widest overall range, but all genres including documentaries actually have very similar distributions.

Another thought was to split movies by country. While IMDB voters are international, Metascore is very US-centric. So for foreign movies, voters from that country might be more predominant in voting and greater discrepancies observed when comparing to US critics. However, there did not seem to be a strong correlation between country and rating discrepancy.

We do observe a little more fluctuation in the distributions when splitting by country rather than genre, but all distributions still remain very similar. The US has more extremes, but this could also be due a result from the fact that the corresponding sample size is much larger.

As a final attempt, I used the movie's release date as a final dimension. Do voters and critics reach a better consensus over time and do with see a reduced discrepancy for older movies?

The first thing that jumps out is the strong increase in variance of the delta score (IMDB - Metascore). Similarly to the United States in the previous graph, this is also most likely a result from sample size. While IMDB voters can decide to vote for very old movies, Metacritic doesn't have access to reviews from top critics in 1894 to provide Carmencita with a Metascore (although I'm not sure it would be as generous as the IMDB score of 5.8 for this one minute documentary of a movie whose synopsis is: 'Performing on what looks like a small wooden stage, wearing a dress with a hoop skirt and white high-heeled pumps, Carmencita does a dance with kicks and twirls, a smile always on her face.')

But the more subtle trend is the evolution of the median delta value. Over time, it seems that the delta between IMDB score and Metascore has slowly increased, from a Metascore advantage to an IMDB advantage. From a critics perspective, it would appear as if users underrated old movies and overrated more recent ones. However the increase in trend has stabilized when we entered the 21st century.

I couldn't end this post without a mention to Rotten Tomatoes, also highly popular with movie fans. While Metacritic takes a more nuanced approach to how it scores a movie based on the reviews, Rotten Tomatoes only rates it as 0 (negative review) or 1 (positive review) and takes the average. In his blog post, Phil Roth explores the relationship between Metascores and Rotten Tomatoes and, as expected, finds a very strong relationship between the two:

So to quickly recap, there is clearly a strong correlation between the two ratings, but with enough variance that both provide some information. For some people, their tastes might better align with top critics whereas others might have similar satisfaction levels as their peers. Personally, I've rarely been disappointed by a movie with an IMDB score greater than 7.0, and I guess that I'll continue to use that rule of thumb.

And definitely won't check Fandango.

No comments:

Post a Comment