April 23, 2007
MySQL Conference: Scaling and High Availability Architectures
Today was the first day of the MySQL Conference & Expo in Santa Clara. This first day consisted of two extended (3.5 hour) “tutorial” sessions, with your choice of seven topics during each session.
Interesting audience metric: most of the audience members considered their applications to be in their “late teens,” that is, the awkward time where audience size has grown and you’re seriously in need of a new architecture to scale to the next level.
I wouldn’t have expected so many to self-identify with that camp, although, as Jeremy pointed out, the whole interaction-everywhere UI of “Web 2.0” means a lot more sites are in that position than would be otherwise.
Jumping right to the “takeaway” slide: their recommendation to achieve both scalability and high availability is data partitioning (using HiveDB) across multiple tiers, coupled with a master-master replication topology using the IP takeover strategy to avoid writing to both masters in a tier simultaneously. (That sounds like a run-on sentence, doesn’t it?)
I’d like to focus on a couple of observatons/critiques.
The audience expressed concerns about how much of scalability was people/process management, and how early in the lifecycle should you bite the bullet and commit to a really scalable architecture. Neither concern was really addressed in any depth, which, given the audience mix noted above, probably wasn’t important. It would have been more important in a room full of two-month old startups.
When Jeremy talked about partitioning, he referred to distributing data with the same schema across multiple physical servers. This needed clarification several times. He also totally discounted the value (for either raw performance or scalability) of distributing across multiple tables on the same server.
It’s pretty trivial to show that, performance wise, a MyISAM-based table with a lot of writes benefits greatly from splitting into multiple tables on the same server. Since MyISAM tables only support table-level locks, the more tables you have, the less contention.
I also suspect–though I’m not enough of an expert on InnoDB to prove it at this point–that smaller tables will be faster with InnoDB, especially in applications where you have a lot of inserts that require rearranging by primary key. (InnoDB tables are index-ordered; if you’re not using auto_increment, you’re probably rearranging data on inserts.) Not to mention it’s more likely that blocks that make it into the file system cache incidentally will actually be used later.
Even barring performance gains, though, it seems a good recommendation from a scaling perspective. One of the things Jeremy spent some time on is the difficulty of repartitioning as you add more servers. But such growth is made much easier if you start off with many independent tables, and move them onto new servers as necessary.
There were two interesting audience-suggested partitioning strategies. A woman from Travelocity suggested geography-based paritioning with enough authority to make me think they’re doing that in production. Then, a guy from Technorati suggested date-based partitioning, providing an example of calculating the authority of a blog. While this is usually an archiving strategy, it feels perfectly natural in their application for on-line data.
Another possible oversight was the suggestion that critical reads should always hit the master to avoid stale data. The downside, of course, is that critical reads can swamp the master and take the master down–a far worse situation than merely serving stale data. There are strategies for dealing with replication latency outside of the database (session affinity, copying new data into a local cache, etc.), but they were not touched on in this presentation. That makes me wonder how often such strategies are deployed in the (webified) real world.
Finally, he spent a lot of time on discussing various replication topologies. More time than I would have liked, but I think it served a purpose: to drive home the fact that replication never provides gains for writes. Something that’s easy to overlook if you’re new to replication.
For a good overview of the presentation’s structure and progession, I direct you to Ronald Bradford’s excellent session notes.