April 20, 2009

At the MySQL Conference 2009

Posted in Conferences tagged , at 11:26 pm by mj

I’m back at the MySQL Conference again. This year, I skipped the tutorials/workshops. And it’s a good thing, too, because I had a full day between my day jobs, tax issues and other matters.

You might be interested in my tentative schedule of sessions. Even after much editing, there are still conflicts. Not to mention MySQL Camp and Percona sessions upstairs.

This year I’ve decided my focus will be on MySQL performance and areas that could use some contributions. I need to get involved with the tools I use. I’ve also been looking to evaluate PostgreSQL more, and think a deeper understanding of many of the performance trouble spots that I’ve taken for granted will help.

In years past, I focused on my first love: hearing what other people are building, the scaling challenges they’ve faced, and their (sometimes novel) solutions. Which I just call “emerging best practices,” but it’s more than that.

This year I will not be going with co-workers, so I’m on my own mostly. I know of at least two former co-workers who will be at the conference, but many are skipping this year (read: their employers are struggling and don’t want to pay up). Only thing left is for me to show up naked.

Maybe this year I can avoid falling into the swimming pool.

Finally: this year, I have no pretense that I will be able to blog the sessions. Tried it two years in a row, didn’t work out so well. Besides, there are plenty of other sources for that.

My first challenge is going to be being up and dressed and ready to go by 7:30am. I’ve been working from home way too much…

July 17, 2008

MySQL to MSSQL Server Replication?

Posted in Software tagged , , , , at 1:51 pm by mj

The team I’m working with is in the process of moving from SQL Server to MySQL.

To ease the burden, we need all writes to the new MySQL databases to be replicated back into SQL Server, so that all of our back-end reports and what-not will continue to function. It’s short term, but not shorter than six months.

I can’t find any generic tool online to do this.

I thought about setting up a bridge MySQL instance with triggers that create a message queue, then reuse our existing Java DAOs to do the heavy lifting (they know how to read/write either SQL Server or MySQL). But, that quickly gets invasive and brittle.

My next thought was a lightweight wrapper around the mysqlbinlog program which does essentially what the MySQL replication threads do: reads a portion of the binlogs, persists its current position, then replays the statements.

Then I thought, “well, if we’re doing that, why not just reuse MySQL’s replication code and build something around that core?”

You can see the hole I’m getting myself into.

Are there better strategies? Are there (free or commercial) tools that can replay replication logs on a MSSQL Server instance?

The lightweight Perl/Python wrapper seems the best solution. Anybody have experience with something like this (for SQL Server or otherwise)?

April 14, 2008

MySQL Conference ’08, Day 0: DRBD & Real World Scaling

Posted in Conferences tagged , at 11:06 pm by mj

Today was tutorial/workshop day. It’s good to be back. This year, the Santa Clara Convention Center is undergoing a bit of construction, and so the large green yard has been torn up. Too bad. Having lunch on that lawn on a nice Bay Area afternoon is hard to beat. Instead, we’re limited to the pool area (still nice), or an indoor area.

My morning tutorial was Florian Haas and Philipp Reisner’s Hands-on Introduction to High-availability MySQL and DRBD.

It’s hard to read any MySQL blogs and not have heard about DRBD in some capacity, but somehow I’ve totally missed picking up any details about it. This really was a good introduction, with a combination overview of what DRBD is and what it’s good for, and walking through multiple examples of configuring and operating DRBD (including failure scenarios).

There were two big take-aways for me.

First, DRBD replicates block devices, and operates beneath the file system. The downside to this is only one server can mount the file system at a time. Thus, DRBD is only useful in “hot standby” scenarios. This was a little disappointing to me.

Second, failing over to the standby is not a task for DRBD, but for a cluster manager like Heartbeat. During a failover, the cluster manager mounts the file system on the (newly promoted) master, fsck’s the file system, and then starts DRBD. This is potentially slow for a large file system, and thus is not guaranteed to have near-zero downtime during the transition.

Overall, it was a good presentation, and the way they played off each other kind of reminded me of some of Peter and Heikik’s presentations. (This is a good thing.)

My afternoon tutorial was Ask Bjørn Hansen’s Real World Web: Performance & Scalability, MySQL Edition.

Overall, this was a good, energetic presentation. Ask wants people to “think scalable”–which means thinking critically about all advice, and making decisions from actual numbers instead of assumptions.

He gave an excellent overview of the myriad scaling options that have emerged in the last decade. Horizontal scaling, shared nothing architectures, page vs fragment vs data vs query caching, memcached, intelligent cache invalidation, partitioning, master-slave replication, master-master replication, summary tables, and so on.

I think by the end of the week, Ask’s tutorial will be remembered as one of the better ones. I think it would have been better if he had gone slower, kept his focus on scaling (he sort of lost focus and spent the last hour talking about version control and code reviews and so on–important, but not strictly scalability topics), and tied his advice back to actual teams implementing that advice. Who’s using master-master replication, and why, and at what scale are they operating?

There were a couple of “unconventional” bits of advice in his presentation that are worth considering.

First, he advocates against RAID 5. If you need that level of redundancy, better to use RAID 6 because it allows faster and less read-intensive recovery of a single failed disk, and because it can withstand 3 failures instead of only 2. Makes sense, but all the teams whose presentations I’ve seen or read have gone with RAID 5 or RAID 10 depending on need. Is anybody using RAID 6?

Second, he advocates against persistent database connections, or at least reconsidering them. This isn’t completely unheard of. Flickr, for example, re-establishes a DB connection on each request. It’s acceptable because MySQL is fast at establishing connections, and in a master-master sharded architecture like Flickr’s you’re only dealing with one or two database connections per request anyway. I think it’s safer to say “if you’re following other best practices, don’t put any extra effort into database connection pooling until you’ve proven it’s a bottleneck.”

Finally, and something we hear too little of, he says it’s OK to use the “scale up” strategy for a while. It’s refreshing to hear somebody say this, because too often we’re so focused on “getting it right” from the ground floor that we forget that, say, “scaling out with cheap commodity hardware” for YouTube means purchasing hundreds of servers that are each four times more expensive than most of us can afford in the first year of our site’s existence.

Anyway, this was a good start.

The MySQL conference is still the best in terms of having power strips everywhere and generally good wireless coverage. Unfortunately, there were a couple of times today when the network became unusably slow. I hope this is not a sign of problems the rest of the week.

I got to briefly speak with several people, including a guy from iPhotoStock and a guy from Adzilla. I’m not good at striking up conversations with people, so I try to use conferences as a place to practice my terrible socialization skills. So far, it seems to have gone all right. And that there are so many people from outside the Bay Area–and a ton of Canadians–makes the conference even more worthwhile.

April 13, 2008

Attending the MySQL Conference This Week

Posted in Conferences tagged , , at 10:21 pm by mj

To the (at least) eight of you whom I owe e-mails or phone calls (or more…), I apologize. The past three weeks, I’ve been commuting from SF to Seattle (Virgin America rocks!), and the week before I was in Cleveland. It’s been an exciting time, but it hasn’t left much room for other things.

Anyway, today–after finishing my taxes–I finally sat down with the MySQL conference program and drafted out my schedule.

Unfortunately, O’Reilly does not provide a way to share my personal schedule with anybody, which is disappointing. (Um, hello?)

My primary interests really run in two directions: hard-core scaling and performance metrics and strategies, and details about how other teams have solved their problems.

As long as I remind myself of that focus, I can minimize conflicts between sessions. Although, there are still several.

Last year’s conference was excellent. This year’s conference seems geared in a different direction, but still excellent. I hope to learn and think a lot, and get out of my comfort zone a bit.

I haven’t yet seen any of the BoF sessions scheduled, but I intend on attending a couple this year.

If you’re interested in meeting up with me–particularly if you’re someone I owe an email or a call to–ping me.

I shall endeavour to post at least once per day this week with my notes and observations.

October 21, 2007

Streaming MySQL Results in Java

Posted in Development, Software tagged , , , at 7:17 pm by mj

The MySQL C API provides two methods for fetching results:

preferred method, which stores all rows in a temporary buffer on the client
optional method, which gives the caller access to each row as it’s accessed through mysql_fetch_row() without first storing all rows in a temporary buffer

For example, when I need to process a huge number of results from a MySQL table–such as for offline data migration–I reach for the mysql command-line client with the --quick argument. This prevents the dreaded knock on the door from kswapd, and often is faster than doing data transformations in the database (when combined with a subsequent LOAD DATA LOCAL INFILE ....).

You can do the same thing with Perl’s DBD::mysql with something like $dbh->{’mysql_use_result’}=1. In PHP, you use the method mysql_unbuffered_query. I assume any library built on top of the native C API provides a similar step for setting this on a per-handle basis.

What I didn’t know was that Connector/J also supports streaming results. According to their documentation, you achieve this with the following:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);

But it turns out that com.mysql.jdbc.Connection#createStatement() uses TYPE_FORWARD_ONLY and CONCUR_READ_ONLY by default. All that’s left for the caller to do, then, is use the unintuitive “magic value” for fetchSize.

What you get back from executeQuery(query) is a ResultSet implementation with a backing store of type com.mysql.jdbc.RowDataDynamic, which reads the next row on each call to next().

This means it’s easy to achieve this even from within Hibernate using session.createSQLQuery. It’s unfortunate that you have to work outside the framework, but special cases like this are OK as far as I’m concerned. And if you’re processing a huge number of results, you probably don’t want all the reflection and serialization overhead of Hibernate, anyway. Update (2007-10-23): I was wrong. You can set the fetch size from a normal Hibernate Query, so no need to work outside the framework. What I said about the reflection overhead when processing a massive result set still goes, though.

Unfortunately, this doesn’t save you from the all GC overhead. Even if you’re careful to reuse your own buffers,
com.mysql.jdbc.MysqlIO#nextRow isn’t so careful. Each row allocates its own byte[][].

This is useful if, for example, you need to process tens of millions of rows from a table, but it’s incredibly expensive to generate the result set so batched fetches using LIMIT ofs,count won’t work. It’s also useful if you’re memory constrained on the client.

A couple of caveats

First, mysql_use_result() PREVENTS ANY OTHER CONCURRENT UPDATES. Even with InnoDB, this means each row in your result set will be locked until you completely process the results. You should only use this on a backend database designed for such queries, and then, you might consider writing your results into a temporary table first. Update (2007/10/23): The same is also apparently true whenever results are sent to the client; the caveat is just more important when streaming results because of the temptation/presumption of doing additional work, therefore taking longer to consume the results. I’ll have more on this later.

Second, each MySQL server has a net_write_timeout setting. Be sure you know what this value is. Default is 60 seconds, but it’s often set to 1 hour or more. You must process your results within this amount of time, or the gremlins will come in through your window.