April 28, 2007

MySQL Conference: Scaling Fotolog

Posted in Conferences, fotolog, Scale, Software at 12:43 am by mj

Farhan “Frank Mash” Mashraqi, DBA at Fotolog, gave a nice little presentation titled Scaling the world’s largest photo blogging community.

Fotolog is a bit different from most sites presenting at this year’s conference. Because it’s based on the idea of photo “blogging,” free members are limited to posting (uploading) one photo per day, while paid members are limited to posting 6 photos per day.

He presented an Alexa graph showing Fotolog recently surpassing Flickr in pageviews. This was really surprising to me and made me take notice (I wasn’t the only one). However, later I looked at the same data in compete.com, which shows an utterly different picture.

Interesting stats:

  • 2.4 billion comments
  • 228 million photos (unsure whether they’re counting the “Flickr way” or the normal way)
  • 500,000 uploads per day peak (probably 200-300K unique members uploading)
  • average 24 minutes per visit (high for a web site)
  • running Solaris 10
  • converting from PHP to Java (no motivation given)
  • 40 x 3GB memcached instances
  • 32 MySQL servers, segmented into 4 tiers (user, guestbook, photo, friends+faves)
  • recently converted from MyISAM to InnoDB
  • using 3par servers for image store

When they converted to InnoDB, they found they still had table lock contentions. Why? Because they were using auto_increment to generate their IDs. To get around this, they changed their primary key to be a composite of existing fields, which, additionally, represents the way data is commonly queried.

For example, their comments use a (photo identifier, post date, comment identifier) composite for their primary key. Since they usually show comments from a given photo ordered by date, that can be done entirely through the primary key lookup, which, with InnODB, is much faster even than a secondary key lookup.

One thing not discussed is whether the photo identifier in that case is ordered, or how often “random inserts” happen. This is important because of InnoDB’s clustered primary key, which sorts row data in the same order as the primary key. I think he kind of touched on this from a different direction when he digressed a bit to explain how InnoDB’s primary keys are stored and the implications for secondary keys.

I was impressed by some of the benchmarking graphs he produced. He claimed a 30% performance improvement by disabling MySQL’s query cache, and a similar improvement (I think – wasn’t quite sure about this in my notes) by moving from 4GB to 16GB RAM.

Currently, their data is partitioned by the first letter of the username. This, of course, is quite skewed toward certain letters, and results in under-utilization for some instances. It wasn’t clear how these map to physical servers.

The latter part of his presentation focused on the driving factors behind planning their new architecture, wherein he proposed partitioning by date. There seemed to be confusion here, as the lines between “current implementation” and “proposed implementation” were blurred. That may have been cleared up in the Q&A, but I had co-workers tapping on my shoulder and had to leave. 😦

Advertisements

2 Comments »

  1. eas said,

    Regarding the differing views of flickr vs fotolog traffic. Compete appears to only monitor US internet users, but most of Fotolog’s traffic is from Spanish and speaking countries (according to anecdotes and Alexa). Alexa is including non-US internet users.

    I’m curious about your reference to counting photos “the flickr way.” What do you mean by that? Is it that they count all the photos ever uploaded, even though some number have been deleted?

  2. […] Earlier this week, Farhan Mashraqi implemented the same MySQL replication strategy. Farhan is the DBA at Fotolog, whose MySQL conference presentation I blogged about earlier […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: