November 5, 2006

Google CSE: Google Mashup

Posted in Scale, Search at 8:40 pm by mj

I was busy with other things, so I’m just now getting around to checking out Google Custom Search Engine (GCSE). I find I’m a bit disappointed after reading where Ethan Zuckerman explains how GCSE is lacking:

A little poking solves the mystery pretty quickly. Google Coop Search works by searching against the main Google search catalog, retrieving 1000 results and filtering them against the sites you’ve included in your catalog. This makes sense, computationally – these searches are fast, almost as fast as normal Google searches. Rather than conducting 3000 “site:” searches and collating and reranking the results, Google is sacrificing recall, getting 1000 results and discarding those not in your set of chosen sites, which requires one call to the index and a really big regular expression match.

With the result being:

In other words, the little engine I’ve built is useful only if the sites I’ve chosen are relatively high ranking and authoritative sites on the topics I’m searching on.

When I first read about GCSE, I was picturing tens of millions of bit vectors (and entries in BigTable), corresponding to each “custom engine,” and updated with every refresh of their index. Perhaps some smart stuff to make sure entries that haven’t been rebuilt yet use the old index until they are (BigTable seems good for managing that – see my previous entry on the BigTable paper).

I couldn’t imagine a way to scale it practically, but I figured, “Hey, it’s Google…”

Instead it turns out that it’s pretty much a mash-up. Anybody off the street could retrieve the top N results from Google’s API, filter out sites based on include/exclude lists, and dynamically rerank the rest based on preferences.

I’m not knocking it. That’s the definition of dynamic reranking and usually is how personalization is implemented. I’m just disappointed that they’re not doing something way beyond the norm, technically speaking.

Probably more interesting is how they’ll take the data from CSEs and feed some of the keywords and usage data back into Google Co-op.

About these ads

3 Comments »

  1. Vidal said,

    My Google Co-op example:
    You Search for unprotected live webcam streams found through a variety of clever search techniques done with the Google Co-op custom search engine tool.
    http://www.camhacker.com

  2. parrott said,

    i don’t know if I get this one at all. Mashup or hackup…

  3. Mike said,

    how do you make the google cse result links open in the same window? By default it launches a new tab.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: