Those five star ratings systems on websites like Netflix and iTunes have turned us all into a nation of critics (sucks for me, I used to feel special). But is a five star ratings system really the best way to determine whether something is good or not? According to a couple of researchers from MIT, maybe not.
As spotted by the tech site GigaOM, a new paper by Devavrat Shah of MIT’s Laboratory of Information and Decisions Systems argues that online ratings systems “should instead ask users to compare products in pairs, not as stand-alone items.” In other words:
“…the kind of star rating systems that are the status quo on the web today are flawed because, well, humans are flawed. ‘If my mood is bad today, I might give four stars, but tomorrow I’d give five stars. But if you ask me to compare two movies, most likely I will remain true to that for a while,’ Shah says in an article published this week on MIT’s news site. ‘Your three stars might be my five stars, or vice versa. For that reason, I strongly believe that comparison is the right way to capture this.'”
Shah and his team believe that these “pairwise rankings” create a more accurate recommendation model than the five stars (they boast that their algorithm “accurately predict shoppers’ preferences with 20 percent greater accuracy than the kinds of formulas most often in use today”). In other other words, assigning a score to a single movie is a less reliable method of ranking films or television shows than selecting a preference between multiple films or television shows.
This is all tremendously nerdy stuff, so nerdy it’s giving my pocket protector sympathy pains. But it is pretty important, too. Any time you pay any attention to the star ratings on a website before you rent a movie on VOD, or purchase an album, or decide to make a reservation at a restaurant, you are validating those star ratings. And if they’re inaccurate, imagine the impact those inaccuracies have on your choices. If customers are being swayed by what those stars tell them, those stars damn well better be accurate.
Granted, I didn’t go to MIT and I can’t even spell algorithm much less make one of my own. I’m sure their research is sound and well thought-out. But my question would be this: how do they ensure people take pairwise rankings any more seriously than they do five star rankings? If web users, secure in their Internet anonymity, will give a movie one star just because they had a bad burrito for lunch, who’s to say they won’t react similarly when offered the choice of deciding between “Star Wars” and “Battlefield Earth?” How do we know they won’t use that same anarchic impulse to screw with those recommendations too? I’m not sure we do, but maybe there’s some sort of behavioral study that says that they won’t.
The other problem with comparative rankings is something I’ve found through my own casual use of the website Flickchart, where you repeatedly rank pairs of movies in order to create a list of your favorite movies. The site’s a lot of fun to play with, but the comparisons it proposes are often totally absurd. Comparing a thing you love to a thing you hate is easy: when Flickchart throws up “Almost Famous” and “Bad Boys 2” it’s not too hard to pick a winner. But comparing between things you love can very, very tricky.
If Flickchart gave you the comparison of movies above — “L.A. Confidential” and “Chinatown” — you’d have something to choose between. They’re both neo-noirs set in Los Angeles, they have similar tones and themes and ideas. It’s a hard pick, because they’re both phenomenal movies, but there are obvious commonalities. But if Flickchart asks you to compare between, “L.A. Confidential” and, say, “Snow White and the Seven Dwarfs” which do you select? Personally, I’m more likely to watch “L.A. Confidential” if given the choice between the two, but can I really argue “L.A. Confidential” is a “better” movie than “Snow White and the Seven Dwarfs?” “Snow White” is a watershed film in the history of cinema. “L.A. Confidential” is just a very well-acted and well-written crime story. So which one do I pick? And not only which one do I pick, but which one do other people pick? Some might choose according to their taste, some by a quasiobjective system of importance or cultural impact. The room for variation seems almost as large as with five star ratings.
I agree that five star ratings can be very flawed; there’s too much room for human error and not enough nuance. And I’m curious to hear and see more of this pairwise ranking in action. But if you gave me a pairwise ranking of these two options right now, I’m not sure what I would pick.