April 24, 2007

MySQL Conference: Wikipedia: Site Internals &c.

Posted in Conferences, Development, mysql, Scale, Software, wikipedia at 9:26 pm by mj

For Monday’s afternoon “tutorial” session (yes, I’m behind, so what?), I attended Wikipedia: Site Internals, Configuration and Code Examples, and Management Issues, presented by Domas Mituzas.

I have to say that my main interest at this conference is more on what’s being actively deployed and improved on in high-traffic production systems. Scalability is an area where theory interests me less than war stories.

Wikipedia’s story reminds me a lot of mailinator’s story. That is, Domas repeatedly emphasized that Wikipedia is free, is run mostly by volunteers, has no shareholders, and nobody’s going to get fired if the site goes down. Which means they can take some shortcuts and simplify their maintenance tasks with the right architectural designs, which may not scale as well as they’d like, but work anyway.

There were a lot of details here. Maybe too many. Any discussion is going to leave out at least a dozen interesting things. Here’s what I found interesting.

Data: 110 million revisions over 8 million pages. 26 million watch lists. So, not as large as Webshots, Flickr, Facebook, Photobucket, etc.

They utilize several layers of caching, from multiple Squid caches to app-level caches that reside on local disk. They also use UDP-based cache invalidation, and, in keeping with the theme, don’t care much if a few packets are dropped.

Their databases sit behind an LVS load balancer, which will take slaves out of service if replication falls behind. If all slaves are behind, the site is put into read-only mode.

Logged in users always bypass the Squid caches. Anonymous users who edit a page get a cookie set that also bypasses the Squid caches.

There was some discussion that page edits wait to ensure the slaves are caught up, but two direct questions from my colleague were sidestepped. So, my best guess is that either they’re relying on their load balancer’s slave status check, or they’re writing a sentinel value into another table within the same database then selecting that sentinel first thing after getting a connection to a slave.

They never issue ORDER BY clauses at the SQL level, even when paginating results. Instead, they rely on the natural ordering of their indexes and issue something akin to WHERE id > ? LIMIT ?. I don’t know how they handle jumping straight to the 500th page, but it seems a reasonable performance adjustment for many queries in the context of their application.

They’re still running MySQL 4.0.x, have no problems, and don’t plan to upgrade anytime soon.

I didn’t quite grasp their partitioning strategy. The 29-page book of notes he provided discusses various partitioning strategies more hypothetically, and more in terms of distributing reads or intensive tasks with indexes that reside on a subset of slaves.

Finally, revision histories are not stored in their own records, but are stored as compressed blobs, with each revision concatenated together uncompressed, then compressed. Makes a lot of sense to me.

My feeling is that, underneath, Wikipedia’s architecture strikes me as a bit overly complex for their size, as something that’s grown incrementally without the requisite resources to trim down some of the complexities. So, while their philosophy is: “simple, simple, simple, who cares if we’re down a few hours?” there still remains some cruft and relics of prior architectural decisions that they wouldn’t choose again if they were starting over. Which is great. It means they’re human after all.