September 7, 2006

Google Bigtable at OSDI ’06

Posted in Information Retrieval, Scale, Software at 11:01 pm by mj

A team of eight at Google are presenting a surprisingly detailed paper titled Bigtable: A distributed storage system for structured data at OSDI ’06 (another conference I wish I were hip enough to attend).

From the abstract:

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.


A couple of highlights/thoughts (read the paper for the technical stuff):

  • Bigtable itself contains about 100,000 lines of code, excluding tests;
  • they leveraged GFS and Chubby (a “highly-available and persistent distributed lock service”); remember, kids: internal corporate infrastructure and services really matter;
  • in some ways, the Bigtable API is more-or-less logically equivalent to maintaining multiple distributed hashes, one for each column or property (e.g., title, date, etc.); however, unification not only feels more right to the developer, but allows for a more optimized implementation;
  • row keys are sorted lexicographically, allowing developers to control locality of access (e.g., “com.example/example1.html” and “com.example/example2.html” are very likely to be in the same cluster, which means processing all pages from the same domain is more efficient);
  • scale is slightly less than linear due to varying temporal load imbalances (good to know even Google has this problem);
  • “One group of 14 busy clusters with 8069 total tablet servers saw an aggregate volume of more than 1.2 million requests per second, with incoming RPC traffic of about 741MB/s and outgoing RPC traffic of about 16GB/s.” I started drooling until I realized that’s only 148 requests per second per server, and 94KB/s in and 2MB/s out per server. Then I just laughed at those numbers!
  • various teams at Google are now using Bigtable, but it took some convincing (apparent resistance to a non-RDBMS for non-search activities); now many large services–among them, Google Earth, Google Analytics, and Personalized Search–are employing Bigtable to great effect; remember, kids: internal corporate infrastructure and services really matter (did I already say that?)

    Finally, an excerpt from their “lessons learned,” which we’d all do well to remember:

    […] [L]arge, distributed systems are vulnerable to many types of failures, not just the standard network partitions and fail-stop failures assumed in many distributed protocols. […] memory and network consumption, large clock skew, hung machines, extended and asymmetric network partitions, bugs in other systems that we are using, overflow of GFS quotas, and planned and unplanned hardware maintenance.
    [I]t is important to delay adding new features until it is clear how the new features will be used.
    the importance of system-level monitoring (i.e., monitoring Bigtable itself, as well as the client processes using Bigtable).
    The most important lesson we learned is the value of simple designs.


1 Comment »

  1. […] When I first read about GCSE, I was picturing tens of millions of bit vectors (and entries in BigTable), corresponding to each “custom engine,” and updated with every refresh of their index. Perhaps some smart stuff to make sure entries that haven’t been rebuilt yet use the old index until they are (BigTable seems good for managing that – see my previous entry on the BigTable paper). […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: