October 9, 2006
Netflix Prize Competition Gets Going
I met Yi Zhang at SIGIR, and I hope her team wins the grand prize. (For one thing, I was impressed by her forging ahead with an IR program at UCSC.)
Here’s the Netflix Prize in a nutshell: Netflix released 100M customer movie recommendations (out of > 1B), and announced a competition where they’re offering a $1M “grand prize” (plus a $50K yearly “progress prize”). The goal? Using the 100M movie recommendations as your training set, produce an algorithm that beats Netflix’s own in-house “Cinematch” recommender system by at least 10%. You have at least five years to pull it off.
I love it when private entities release large-ish data sets into the research community. Usually, there are limited partnerships involving NDAs signed in triplicate. (That’s still beneficial – I wish Webshots would partner with some researchers.)
Anyhow, Zhang’s was the first to (slightly) beat Cinematch’s using the RMSE metric, but, as of this writing, another team has already shown greater improvement. In fact, “The Thought Gang” is the first to qualify for the yearly “progress prize” (and, hence, $50K next October).
I’m not surprised that researchers are already beating Netflix’s system on this metric a week into the competition.
First, and most important, Netflix places no constraints on performance. While their in-house recommender system needs to scale in multiple directions, the winning algorithm need not. That’s worth at least a percentage point.
Second, while 100M is larger than the typical “large” training sets used in academia for these types of problems, it’s not so large that it requires a complete rethinking of how you approach scaling your solution.
Third, movie recommendations are the kind of thing academic researchers in personalization often focus on, in the absence of more large scale projects. I’d expect a few good academic recommender systems to be so well-tuned to this particular problem, that their first iteration scores very well.
Regardless of who wins the prize, contests like this are good for researchers in this area, as it provides a nice, high-profile introduction to non-researchers. Especially once that $1M prize is handed out.
As a further digression, I wonder how many papers will be published in the coming years based on the Netflix data?