Pretend I’m the owner of a polling company that surveys political races. I prominently advertise my results: According to a Walt Hickey Polling Inc. survey of 600 likely voters, John Doe is beating Jane Doe 58 percent to 40 percent — John Doe will likely win the election. (Let’s say it’s a race for the U.S. Senate.)
But then you keep reading and you notice that the sample on which my poll is based consists of 400 men and 200 women. You can’t really tell whether I’m adjusting the numbers, and if so, how. Would you trust that number? Unless there’s some state I don’t know about where men outnumber women 2-to-1, you shouldn’t.
So why aren’t we more skeptical of movie ratings that do the same thing?
It’s a worthwhile question, and lately it’s made it pretty hard for us to take the ratings provided on IMDb, the largest and most popular movie site on the internet, at face value. The Academy Awards rightly get criticized for reflecting the preferences of a small, unrepresentative sample of the population, but online ratings have the same problem. Even the vaunted IMDb Top 250 — nominally the best-liked films ever — is worth taking with 250 grains of salt. Women accounted for 52 percent of moviegoers in the U.S. and Canada in 2016, according to the most recent annual study by the Motion Picture Association of America. But on the internet, and on ratings sites, they’re a much smaller percentage.
“If you see any number that is a rating number or a number with a percentage sign, it may be compelling or meaningful and it may not be,” said Gary Langer, the president of Langer Research Associates, the polling firm that has long conducted surveys for ABC News. “And what we need to do rather than be seduced by the number is to subject it to meaningful inquiry as to how it was obtained.”
OK, but how skeptical should we be? To figure that out, I wanted to see how strong the male skew of raters is on IMDb and how big an effect that skew has on movies’ scores.
We’ll start with every film that’s eligible for IMDb’s Top 250 list. A film needs 25,000 ratings from regular IMDb voters to qualify for the list. As of Feb. 14, that was 4,377 titles. Of those movies, only 97 had more ratings from women than men. The other 4,280 films were mostly rated by men, and it wasn’t even close for all but a few films. In 3,942 cases (90 percent of all eligible films), the men outnumbered the women by at least 2-to-1. In 2,212 cases (51 percent), men outnumbered women more than 5-to-1. And in 513 cases (12 percent), the men outnumbered the women by at least 10-to-1.
Looking strictly at IMDb’s weighted average — IMDb adjusts the raw ratings it gets “in order to eliminate and reduce attempts at vote stuffing,” but it does not disclose how — the male skew of raters has a pretty significant effect. In 17 percent of cases, the weighted average of the male and female voters was equal, and in another 26 percent of cases, the votes of the men and women were within 0.1 points of one another. But when there was bigger disagreement — i.e. men and women rated a movie differently by 0.2 points or more, on average — the overall score overwhelmingly broke closer to the men’s rating than the women’s rating. The score was closer to the men’s rating more than 48 percent of the time and closer to the women’s rating less than 9 percent of the time, meaning that when there was disagreement, the male preference won out about 85 percent of the time.
That male skew of raters is also apparent in the 250 movies that make IMDb’s top list, which we pulled on Feb. 16:
So, what’s the issue here? If IMDb is content with its ratings being intended almost solely for men, then there isn’t one. (We reached out to IMDb for comment and for more information on how the site adjusts its ratings, but we received no response. So we don’t know, for example, if IMDb is already doing something to the data that accounts for the gender disparity in raters.) But if IMDb seeks to reflect the opinions of the actual movie-going population, the situation is grave.
Can we fix that? Langer is skeptical. Mainly, besides how simple it is for a dedicated individual or group to “manufacture” results, online data from a self-selected group of people is so inherently dubious that any reweighting of that data is also inherently dubious. You can’t just adjust troublesome data to make it reflect the world, he said.
“The notion that you can take bad data and weight it to be OK is … hazardous to your health,” Langer cautioned.
That said, since the scores of the most popular movie site on the internet are already being calculated based on an entirely self-selected sample, would it destroy the IMDb Top 250 to try to mimic actual movie audiences more? I don’t really think so. As a thought experiment, I used everything we know about IMDb’s rating adjustments — which is far from the full picture — and ran them on the ratings of the 4,377 eligible films after I adjusted the raw ratings to weight men’s and women’s views equally.
We can’t do an adjustment that allows us to perfectly replicate the top 250 — again, we don’t know what’s in the black box, so we can’t re-create it — but to approximate it, I excluded any film that didn’t either a) make the IMDb top 1,000 movies list or b) have a rating from the site’s top 1,000 users within 0.87 points1 of the rating from its users overall. This allows us to sidestep films that would have made the top 250 through vote-stuffing.
My main point is that overall, the naive reweighting didn’t destroy the general look of the 250, and if anything, it elevated films that may have been overlooked because one gender is vastly outnumbered.2What if IMDb adjusted ratings toward gender parity?
Estimated highest-ranking films on IMDb if the men’s and women’s ratings were weighted toward 50-50 vs. IMDb’s actual rank as of Feb. 16, 2018
IMDb makes adjustments to its raw ratings but does not disclose its methodology. Therefore, these rankings — which start with the raw ratings — may not match a gender-weighted version of a list made by IMDb itself because we can’t re-create the site’s adjustments.
The top 100 largely includes films from the original list of 250, and the additions to the list — there are a lot of best picture winners among the newbies — appear mainly in the back half of the 250.
Attempting to reflect a target population is a common practice in many fields that use surveys. It’s not clear to me why movie rating sites don’t do it — or, at the least, why they don’t indicate that their scores are almost all based mostly on the opinions of male users.