April 28, 2007

MySQL Conference: Scaling Fotolog

Posted in Conferences, fotolog, Scale, Software at 12:43 am by mj

Farhan “Frank Mash” Mashraqi, DBA at Fotolog, gave a nice little presentation titled Scaling the world’s largest photo blogging community.

Fotolog is a bit different from most sites presenting at this year’s conference. Because it’s based on the idea of photo “blogging,” free members are limited to posting (uploading) one photo per day, while paid members are limited to posting 6 photos per day.

He presented an Alexa graph showing Fotolog recently surpassing Flickr in pageviews. This was really surprising to me and made me take notice (I wasn’t the only one). However, later I looked at the same data in compete.com, which shows an utterly different picture.

Interesting stats:

  • 2.4 billion comments
  • 228 million photos (unsure whether they’re counting the “Flickr way” or the normal way)
  • 500,000 uploads per day peak (probably 200-300K unique members uploading)
  • average 24 minutes per visit (high for a web site)
  • running Solaris 10
  • converting from PHP to Java (no motivation given)
  • 40 x 3GB memcached instances
  • 32 MySQL servers, segmented into 4 tiers (user, guestbook, photo, friends+faves)
  • recently converted from MyISAM to InnoDB
  • using 3par servers for image store

When they converted to InnoDB, they found they still had table lock contentions. Why? Because they were using auto_increment to generate their IDs. To get around this, they changed their primary key to be a composite of existing fields, which, additionally, represents the way data is commonly queried.

For example, their comments use a (photo identifier, post date, comment identifier) composite for their primary key. Since they usually show comments from a given photo ordered by date, that can be done entirely through the primary key lookup, which, with InnODB, is much faster even than a secondary key lookup.

One thing not discussed is whether the photo identifier in that case is ordered, or how often “random inserts” happen. This is important because of InnoDB’s clustered primary key, which sorts row data in the same order as the primary key. I think he kind of touched on this from a different direction when he digressed a bit to explain how InnoDB’s primary keys are stored and the implications for secondary keys.

I was impressed by some of the benchmarking graphs he produced. He claimed a 30% performance improvement by disabling MySQL’s query cache, and a similar improvement (I think – wasn’t quite sure about this in my notes) by moving from 4GB to 16GB RAM.

Currently, their data is partitioned by the first letter of the username. This, of course, is quite skewed toward certain letters, and results in under-utilization for some instances. It wasn’t clear how these map to physical servers.

The latter part of his presentation focused on the driving factors behind planning their new architecture, wherein he proposed partitioning by date. There seemed to be confusion here, as the lines between “current implementation” and “proposed implementation” were blurred. That may have been cleared up in the Q&A, but I had co-workers tapping on my shoulder and had to leave. 😦