October 12, 2006
Eric Billingsley of eBay Research Interviewed by Scoble
Early on, Eric talks about the design of their search engine in 2002. They had been trying to make a legacy in-house search work, and brought in all the major search vendors (Verity, Google, Excite, …) but found their search systems were no better.
Why is that? IR systems usually are built for query-time performance, not for real-time indexing. There are some very good reasons for that.
eBay was among the first to require not only near-real-time indexing, but also to provide large-scale faceted search. Listening to Eric, solving the near-real-time indexing problem using message queues and what-not was apparently easier than solving the faceted browsing problems: showing exactly how many results there are in any given category, where the categories change with every query. (See also my previous post on faceted search and indexing effiency for a recent cool attempt to solve this.)
Like many things in this changing world, what was once large-scale (20M items) and high-performance in 2002 has quickly become expected behavior, not only at eBay but at most social networking sites as well. For example, tag browsing is expected to be near-real-time. Blog search is expected to be near-real-time: Technorati indexes more blog entries every week than eBay had in total in 2002.
Later, Eric demos a couple of ideas in the oven, including what appears to be dynamic ranking of results (reordering of results based on click streams).
One thing that sticks out is he talks about eBay having a full site release every two weeks, and he describes this as “massively high frequency.” In the Web world, I think that is an exaggeration. Weekly–or even daily–releases are more and more the norm. Java shops have a more difficult time keeping up than Perl, Python, PHP or Ruby shops (though the Java shops tend to be larger).
What is probably unique is that he says they have a highly regimented release schedule, which, presumably, means no slips and no code freezes. That’s hard to do with a company the size of eBay (in terms of number of developers).
The interviews goes on about operational issues and a nice-looking (but probably not very useful) social interaction browser.
Interesting fact from near the end: eBay has 768 back-end servers just serving search queries. Documents (items) are split into one of 16 buckets, each of which is served by a cluster of 16 servers, and there are 3 full redundant versions of the whole system, each capable of taking all the traffic in a crunch.
Worth a watch.