April 29, 2007
MySQL Conference: The 7 Stages of Scaling Web Applications
John Engates, CTO of Rackspace, presented his experiences on The 7 Stages of Scaling Web Applications: Strategies for Architects.
This was a bit less interesting that I’d hoped, mainly because there were a lot of generalities and few specifics. One thing that the CTO of Rackspace brings, of course, is experience working with many different organizations as they grow. Obviously, he wasn’t in a position to give specifics of any given organization’s growing pains.
He did provide a good sanity check and general road map to make sure your application is evolving correctly. If you find yourself deviating significantly, you should pause to reflect on the reasons.
John gave the best definitions of “high availability” and “scalability” that I saw at the conference. Namely:
- high availability
a design and implementation that ensures a certain degree of operational continuity
a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged as demands increase
I’m kind of blogged out at the moment. Here are the 7 stages, in brief:
- 2-tier; pair of web servers, single database, internal storage, low operational cost, low complexity
- more of same, just bigger; maybe shared storage across multiple databases
- expontential traffic increase from publicity; more web servers, simple master-slave replication topology, split reads and writes, some application retooling
- intensified pain; replication latency, too many writes from single master, segmenting data by features, shared storage for application data, big-time application rearchitecting
- panicky, severe pain; rethinking the whole application; data partitioning (by geographical, user id, etc.), user clustering with all features available on each cluster, directory-based mapping of users to clusters
- less pain; finally adding new features again, horizontally scalable with more hardware, acceptable performance
- “entering the unknown”; investigating potential bottleness in firewalls, load balancers, network, storage, processes, backup/recovery; thinking about moving beyond a single datacenter; still difficult to replicate and load balance geographically
Most of these should sound familiar to many readers. Some of us do it a bit backwards (for example, eliminating network bottlenecks before application or database bottlenecks), and the smart ones focus on bottlenecks 12 months before they’re the limiting factor.
Among his recommendations:
- leverage existing technologies and platforms
- favor horizontal scaling over vertical scaling
- shared nothing architectures are common for a reason
- develop sufficient useful instrumentation
- don’t overoptimize, but do load test
- RAM is more important than CPU
- consider each feature in the context of its performance and scaling implications