October 19, 2008

Restlet: A REST ‘Framework’ for Java

Posted in Development, Software tagged , , , at 2:33 pm by mj

Building an API on the REST architectural style? Building it in Java?

This past week, on a project doing just that, I ran into Restlet. I’d never heard of a REST framework for Java before, but it’s been featured InfoQ, TSS, ONJava, and others over the past three years. (Damn, I need to pay more attention.)

And it kicks ass.

Here’s a quick run-down:

Restlet is an API for developing REST-based services in the same way that Servlet is an API for developing Web-based services. Your application never deals with the Servlet API, HTTP-specific attributes, cookies, sessions, JSPs, or any of that baggage.

Instead, you think and code in terms of REST: Routing, Resources, Representations.

It has an intuitive URL routing system that parses out resource identifiers and other data (that is, a URL template of /user/{user_id} would give you a ‘user_id’ attribute for any URL matching that pattern, which is fed into your User resource).

Resources are easily able to define which of the verbs (GET, POST, PUT, DELETE) they respond to, with default behavior defined for verbs that are unsupported.

There are plug-ins available for SSL and OAuth, an emerging best practice for authenticating third party access to user accounts.

The documentation is a bit lacking. However, there is an excellent IBM developerWorks tutorial on Restlet (registration required) that lays out pretty much everything you need, with a (nearly-)complete example for study.

September 27, 2008

Three subversion tips: svn:ignore, svn merge, svn move

Posted in Development, Software tagged , , at 7:57 am by mj

Since I complained earlier this year about the state of Subversion tools, I’ve been thinking about a follow-up that’s a bit more positive.

This doesn’t exactly count, but I thought I’d share a few productivity lessons I’ve learned recently.

Using svn:ignore
svn:ignore is a special subversion property that instructs Subversion to ignore any files (or directories) that match a given pattern.

The common use case is to ignore build artifacts to prevent accidental check-ins and eliminate clutter on svn status, etc. For example, you can ignore all *.jar files in a particular directory, or ignore your build directory, etc.

Unfortunately, this can tend to hide problems with your build artifacts. For a project I’m working on now, we have timestamped JAR files stuffed into a common directory. The JAR files themselves are svn:ignore‘d, which means svn status will never display them.

And as I found recently, this could result in 8 GB of “hidden” files that only becomes apparent when you, say, try to copy a remote workspace into a local one for managing with Eclipse.

Shame on the developers for not deleting them as part of ant clean. But it happens, no getting around that.

Thankfully, the Subversion developers thought about this case, and introduced the --no-ignore flag to svn status. With this option, ignored files are displayed along with added, modified and deleted files, with an I in the first column.

Cleaning up your subversion repository is, therefore, as simple as:

svn status --no-ignore |
grep -P '^I' |
perl -n -e '/^\I[\s\t]+(.*)$/; my $f=$1; if (-d $f) { print "Deleting directory $f\n"; `rm -rv "$f"`; } else { print "Deleting file $f\n"; `rm -v "$f"`; }'

That will remove all files and directories that Subversion is ignoring (but not files that just have not yet been added to source control). Stick that in a script in your path, and live happily ever after.


Merging back into trunk
The most common use case when merging is to provide a range of revisions in trunk to pull into your branch. For example:

svn merge -r 100:114 http://example.com/svn/myproject/trunk/

What happens is you tell Subversion, “I don’t care what happened before revision 100, because that’s already in my branch…so just apply changes between version 100 and 114.”

But what’s not obvious–nor, as far as I can tell, available in standard reference books–is how to merge back into trunk. It turns out, the way to do this is to disregard everything you’ve learned about subversion.

The problem is that you’ve been merging changes from trunk into your branch. So if you simply choose the naive approach of picking up all changes since your base branch revision until your final check-in, and try to apply those to trunk, you’ll get conflicts galore, even on files you never touched in your branch (except to pull from trunk).

The solution is to use a different form of the merge command, as so:

svn merge ./@115 http://example.com/svn/myproject/branches/mybranch/@115

where revision 115 represents your last merge from trunk.

This actually just compares the two repositories at the specified revision, and pulls in the differences, all the differences, and nothing but the differences. So help me Knuth.


Beware the power of svn move
One of the much-touted benefits of subversion (particularly as compared to CVS) is the support for moving files around. But, until 1.5, there has been a glaring error that is often overlooked and can get you into trouble.

Because svn move is implemented as a svn delete followed by a svn add, Subversion thinks the new file has no relation to the old file. Therefore, if you have local changes to foo, and your arch nemesisco-worker Randy moves it to bar, your changes will simply disappear!

Subversion 1.5 has partially addressed this, at least for single files. Under the new regime, your changes to foo will be merged with any changes to bar. However, you still need to be careful with moving directories.

This is more insidious than moving methods around inside the same file. While in that case Subversion will freak out and your merges will become difficult, at least you’ll see the conflict and your changes won’t disappear while you’re not looking.

The lesson, then, is to talk with your team-mates before any refactoring. (svn lock doesn’t seem to provide any help unless everybody’s in trunk.)

Rumor has it svn 1.6 will address this even more practically by introducing the svn fuck-you-and-your-dog command. But until then, you have to do it the old fashion way.

September 6, 2008

Creating Database Sandboxes for Unit/Integration Tests

Posted in Development, Software tagged , , , at 9:45 am by mj

After Baron Schwartz’s recent hint at having solved unit testing database sandboxes at a previous employer, I got to thinking about the problem again.

To be clear: this is not really unit testing, but I’ve found integration tests at various levels are just as important as unit tests. So much so that I have taken to creating both test and integration source directories, and, whenever possible, requiring both suites to pass as part of the build process.

There are two suggestions I’ve seen for solving this problem, both of which are applicable for local in-memory databases as well.

First, starting with a completely empty database, populating it, and then tearing it down. Unfortunately, this is not only difficult, it’s time consuming. If you do this before each test, your tests will take hours to run. If you do this before the whole suite, your tests will not be isolated enough.

A previous co-worker had suggested an incremental approach. Start out with an empty data set, and let each test (perhaps through annotations) define which data must be fresh. I like that. It requires a lot of infrastructure and discipline. It could encourage simpler tests, although with simpler tests come more tests, thus more discipline.

The other approach I’ve seen suggested a couple of times now (including in a comment on Baron’s blog) is the use of a global transaction. Unfortunately, this does not work with all database engines. MySQL tends to be the real killjoy, because nested transactions are not supported and DDL statements are not transactional. Yeah, even in the transactional engines.

So, here’s what I’m thinking. If I were starting over with a new team, with minimal code already existing, I think I wouldn’t solve this problem from an engineering/code perspective. Instead, I’d solve it from an operational perspective (though it still requires application/test infrastructure changes).

Picture a central test database server with one pristine copy of the data, and thousands of database instances. The application (test) asks this server for an available database instance, uses it for a single test, and then moves on. The next test resets the application state, so it asks the server for another available database instance, and so on.

Meanwhile, there is a daemon on that server that is constantly checking each database instance. If the underlying data files do not match the pristine copy, they are restored and the instance is placed back into the available pool.

An instance is considered available for testing when (a) there are no threads connected to it, and (b) its data files match the pristine copy.

Tests that do not alter the underlying data files do not require restoration.

What about schema changes? Answer: you have to unit/integration test them too. When you’re ready to promote your code, you deploy to the pristine copy as part of the standard production release process. An interesting side effect of this is it will, in many cases, force other developers to merge production changes back into their private branches, because many of their tests will probably fail.

Contrary to Baron’s suggestion, in a properly designed system this does not require changes to production code. As long as you can inject a database connection pool into your application–and all quality code should have this property (cough, cough)–your test framework can inject a connection pool that interrogates the test server first.

And it can scale to multiple test database servers as your team and the number of tests grows.

I haven’t tried this, and I have too many deadlines (and too much legacy code that I’m still learning in my current team) to experiment with a real-world application.

But what do you think? What holes are there in this proposal?

Aside from violating the Engineering Aesthetic that the application should control the environment it needs for testing. Which is what I think has caused me the most problems over the years.

October 21, 2007

Streaming MySQL Results in Java

Posted in Development, Software tagged , , , at 7:17 pm by mj

The MySQL C API provides two methods for fetching results:

preferred method, which stores all rows in a temporary buffer on the client
optional method, which gives the caller access to each row as it’s accessed through mysql_fetch_row() without first storing all rows in a temporary buffer

For example, when I need to process a huge number of results from a MySQL table–such as for offline data migration–I reach for the mysql command-line client with the --quick argument. This prevents the dreaded knock on the door from kswapd, and often is faster than doing data transformations in the database (when combined with a subsequent LOAD DATA LOCAL INFILE ....).

You can do the same thing with Perl’s DBD::mysql with something like $dbh->{’mysql_use_result’}=1. In PHP, you use the method mysql_unbuffered_query. I assume any library built on top of the native C API provides a similar step for setting this on a per-handle basis.

What I didn’t know was that Connector/J also supports streaming results. According to their documentation, you achieve this with the following:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);

But it turns out that com.mysql.jdbc.Connection#createStatement() uses TYPE_FORWARD_ONLY and CONCUR_READ_ONLY by default. All that’s left for the caller to do, then, is use the unintuitive “magic value” for fetchSize.

What you get back from executeQuery(query) is a ResultSet implementation with a backing store of type com.mysql.jdbc.RowDataDynamic, which reads the next row on each call to next().

This means it’s easy to achieve this even from within Hibernate using session.createSQLQuery. It’s unfortunate that you have to work outside the framework, but special cases like this are OK as far as I’m concerned. And if you’re processing a huge number of results, you probably don’t want all the reflection and serialization overhead of Hibernate, anyway. Update (2007-10-23): I was wrong. You can set the fetch size from a normal Hibernate Query, so no need to work outside the framework. What I said about the reflection overhead when processing a massive result set still goes, though.

Unfortunately, this doesn’t save you from the all GC overhead. Even if you’re careful to reuse your own buffers,
com.mysql.jdbc.MysqlIO#nextRow isn’t so careful. Each row allocates its own byte[][].

This is useful if, for example, you need to process tens of millions of rows from a table, but it’s incredibly expensive to generate the result set so batched fetches using LIMIT ofs,count won’t work. It’s also useful if you’re memory constrained on the client.

A couple of caveats

First, mysql_use_result() PREVENTS ANY OTHER CONCURRENT UPDATES. Even with InnoDB, this means each row in your result set will be locked until you completely process the results. You should only use this on a backend database designed for such queries, and then, you might consider writing your results into a temporary table first. Update (2007/10/23): The same is also apparently true whenever results are sent to the client; the caveat is just more important when streaming results because of the temptation/presumption of doing additional work, therefore taking longer to consume the results. I’ll have more on this later.

Second, each MySQL server has a net_write_timeout setting. Be sure you know what this value is. Default is 60 seconds, but it’s often set to 1 hour or more. You must process your results within this amount of time, or the gremlins will come in through your window.

September 22, 2007

Turning a corner on unit testing

Posted in Coding, Development, Me at 3:53 pm by mj

I think I’ve finally done it.

This week, I found myself struggling within the confines of existing infrastructure code that had no unit tests. The task was relatively simple: a new feature at the infrastructure-level that would enable many other product-level features in coming months. If I’d gone down the hackish route and not worked at the infrastructure level, I could’ve isolated this code and tested it in complete isolation. But that would just lead to more bugs down the road, and I’d still have a lot of integration testing to do.

It was one of the more painful and slow development experiences in recent memory. Even worse than dealing with 10-year old, undocumented, legacy Perl code written by non-programmers. Yeah, that painful.

For a long time, I contended that unit testing’s primary benefit was long-term: ensuring that refactoring code which you did not write and do not fully understand does not break. And that you need to reach a certain critical mass in terms of number and variety of tasks before it becomes effective. And that’s still true. I think my skepticism rubbed off, too, because I hardly hear anyone saying, “But the unit tests pass, there can’t be a bug” anymore.

But then I started looking at how I approached the practice of programming. Before I write code–often as part of designing the system–I write out stubs of how I’m going to use the code. Because an aesthetically unpleasing or overly complicated API is an error-prone API. And it dawned on me that this is one of the big benefits of test-first development.

Last year, when I read Agile Software Development: Principles, Patterns, and Practices (finally after it had been on my bookshelf for years), it was a bit like a revelation. Agile development, really, is formalizing what are otherwise good development-time practices anyway. So I resolved to better formalize my development activities.

It’s taken a while, but now I am so accustomed to using unit tests to isolate problems, and the quick turnaround that entails, that there is a significant mental barrier to any other way of doing it. I’ve turned that corner and am heading down NotReallyUtopiaButGoodEnough Street.

July 8, 2007

Creating a RESTful API Detailed Case Study

Posted in api, Development, Software, Web2.0 at 9:48 pm by mj

Joe Gregorio posted a great example of designing a RESTful API, and the brief ensuing discussion is instructive, too. (Yes, it’s a few weeks old.)

He walks through the steps of extracting the nouns, mapping the verbs, ensuring correctness and reliability, and generating and validating tickets to provide a RESTful API for a moderately complex situation.

It looks like development tools are finally catching up, such that it’s much easier to rely on HTTP verbs and response codes than it was even a year ago. From my limited experience, though, it seems there are still some gaps, and thus obstacles. If you have experience in this area, drop me a line.

June 3, 2007

Linus Torvalds on Source Control

Posted in Coding, Development, Software at 12:51 pm by mj

This was on Slashdot earlier. It’s a really interesting video of Linus speaking at the Googleplex about the design philosophy of GIT, and why other version control systems are solving the wrong problems.

Forget all the comments about Linus being an arrogant git and so on, he makes a compelling point about distributed repositories and networks of trust, even inside corporate environments. Toward the end, he also talks about how GIT’s use of SHA-1 to provide consistency also aids in recovery of a lost/corrupted repository even from people you don’t trust.

Of course, he is a bit wrong about the problems with CVS/SVN merges. Developers experienced with merging in those environments know how to get around some of the most annoying problems (e.g., remember (tag) the point at which you last merged to avoid phantom conflicts). But he’s still right about the more fundamental design flaws.

May 20, 2007

Parallelizing replication in MySQL

Posted in Development, mysql, Scale at 8:15 am by mj

A month ago, Paul Tuckfield’s (of YouTube) keynote at the MySQL Conference & Expo received a lot of attention, mainly for outlining a parallelized prefetching strategy for MySQL replication.

The basic strategy is that, prior to running updates from the binlog, you instead turn the updates into selects and parallelize them. This brings the necessary data into buffer, which speeds up the (serialized) updates. Paul called this the “oracle algorithm,” so-called because it lets the replication thread see into the future and prime its cache for the upcoming data update. Jan Lehnardt gave a better run-down of the reasoning.

Earlier this week, Farhan Mashraqi implemented the same MySQL replication strategy. Farhan is the DBA at Fotolog, whose MySQL conference presentation I blogged about earlier

Paul claimed a 400% improvement in replication performance at YouTube via this method. I’m most interested in Farhan’s results. I’m skeptical that you’d get that much improvement if your writes lean more heavily toward inserts than updates, but the beauty is that it’s an improvement that can be done completely outside the normal MySQL replication, and does not significantly affect your slave’s ability to respond to end user requests.

I’m wondering, though, if it’s possible to really parallelize MySQL replication given the emerging trends in data partitioning. But, first, let’s back up.

When replicating queries (insert/update/delete/alter table/drop table/etc.) from the master to the slave, all your statements get serialized in the order in which they were executed by the MySQL master. Yet, when you’re performing the original queries, they’re parallelized–multiple application servers are executing queries simultaneously. Depending on your hardware (and, especially, disk) configuration, you’re achieving varying levels of real concurrency at the database level. This is why replication on the slaves often falls behind the master–it’s not just, or even primarily, a data transfer issue. (Update: obviously, another factor is that slaves are also accepting end user selects, where masters often do not.)

So why serialize the statements when they’re replicated? The reason is determinancy. If inserts/deletes/updates are getting executed in a different order on the slaves than on the master, you could end up with inconsistent data.

For example, you might upload a photo and then delete it. But if the slave executes them in the opposite order, then you could end up deleting the photo before you uploaded it–effectively the same as not deleting it at all! (Or, in some cases, stopping the replication thread completely. D’oh!)

In a typical by-the-book RDBMS, the same applies to transactions. For example, you might own a site that allows logged out visitors to sign up for a new account on the same page that they upload their first photo. On the server, you’re going to create the user account first, and then create the photo and store the pixel data.

But in many modern high-volume Web applications, this already doesn’t hold. Often, your users table and your photos table are going to be on different servers, and you’re using MySQL. So what does this mean? This means that in order to eek out the last bit of performance, you’re already accepting that the slaves may be inconsistent. The photo might be created first, and the user may get replicated seconds, minutes or even hours later. Depending on how resilient your code is, a friend viewing your photo may see missing parts of the page, or a bunch of “null”s displayed, or an error page.

Increasingly, high traffic–and, these days, even moderate and low traffic–sites are converging on data partitioning as the optimal solution. Specifically, a particular kind of data partitioning that we may call sharding. That is, splitting a single table (say, your photos table) into multiple tables spread across multiple physical servers, each of which is responsible for only a (usually non-overlapping) segment of your data.

As I observed earlier, a lot of people leave it at that, and assume that there is a 1:1 correspondence between the number of shards and the number of physical servers. However, there are already both scaling and performance advantages to splitting your table into more shards than you have physical servers.

So, let’s assume you’ve done just that: maybe you have 16 masters, each serving 16 shards of your data. And each of those 16 masters is seeing a lot of writes, such that their slaves often run several minutes behind during peak traffic.

What’s really most important for data consistency in this kind of environment is that statements that affect the same pieces of data get serialized. Then in a non-hierarchical, sharded table, you can accomplish that by simply serializing the writes to each table. See where I’m going?

What I propose is configuring a small number of binlogs, and mapping each table in your database to those binlogs. In the above example, each master might put 4 shards into each of 4 binlogs. Each slave then runs 4 replication threads corresponding to each binlog. Voila!, better concurrency!

Yes? No?

Of course, if you’re only running with a single disk in your servers, nothing is going to help you anyway. Or if you have a very, very high slave-to-master ratio, you have to be careful (remember, this quadruples the number of connections to your master).

My original thought on this took it a step further–that is, serializing based on a hash of the primary key (or row number), but that has issues, is more complicated, and requires too many assumptions about how the application is behaving. What I like about this proposal is that your application need behave no different than it already does to accomplish its partitioning strategy; it’s easy to configure and reconfigure; and it is as easy to scale down as it is to scale up. Oh, and it’s more cost effective than simply adding more masters with their own slaves.

Has anybody tried this before? Can anybody see fundamental flaws?

April 25, 2007

MySQL Conference: Technology at Digg

Posted in Conferences, Development, digg, Scale, Software at 11:04 pm by mj

Technology at Digg presented by Eli White and Tim Ellis on Tuesday.

98% reads. 30GB data. Running MySQL 5.0 on Debian across 20 databases, with another ~80 app servers. InnoDB for real-time data, MyISAM for OLAP purposes.

The big thing at Digg has been effective use of memcached, particularly in failure scenarios. The problem: you start with 10 memcache daemons running. One of them goes down, often, they say, because the daemon is doing internal memory management and simply is non-responsive for a bit. So your client starts putting those buckets of data onto another instance. Then…the original instance comes back. The client starts reading data from the original instance, which means potential for reading stale data.

They didn’t give many details about how they solved this problem. One solution given toward the end is to store the daemon server’s name with the key and store that information in a database. When the memcache daemon comes back up, the key names don’t match, so you invalidate it. This requires not only a highly available MySQL database to work, but it also requires two network accesses per data fetch in the best case.

One interesting thing is they’re running their memcached instances on their DB slaves. It sounds like this developed simply because their MySQL servers have more RAM (4GB) than their web servers. I wasn’t the only one a little concerned by this, and I wonder if part of their problem with unresponsive memcache daemons stems from this.

They’ve had an initiative underway for a year to partition their data, which hasn’t been implemented yet. Once again, there was terminology confusion. At Digg, a “shard” refers to a physical MySQL server (node), and “partition” refers to a table on the server. Prior discussions at the conference used opposite definitions. I suspect the community will come to a consensus pretty soon (more on that in a later post).

There was a brief audience digression into the difference between horizontal partitioning (scaling across servers) and vertical partitioning (multiple tables on the same server), which is closer to what partitioning connotes in the Oracle world.

Other notes:

  • developers are pushing back hard against partitioning, partly, it sounds, because it fudges up their query joins. No mention of MySQL’s inefficient CPU/memory usage on joins.
  • struggling with optimizing bad I/O-bound queries
  • issuing a lot of select * from ... queries, which causes problems with certain kinds of schema changes that leave outdated fields in their wake
  • had issues with filesystem reporting writes were synced to disk when they hadn’t really been synced; lots of testing/fudging with parameters; wrote diskTest.pl to assist with testing
  • image filers running xfs because ext3 “doesn’t work at all” for that purpose?! not substantiated by any data; unclear whether they’re talking about storing the images or serving them, or what their image serving architecture is (Squid proxy?)
  • memcached serves as a write-through cache for submitted stories, which hides any replication delay for the user submitting the story

April 24, 2007

MySQL Conference: Wikipedia: Site Internals &c.

Posted in Conferences, Development, mysql, Scale, Software, wikipedia at 9:26 pm by mj

For Monday’s afternoon “tutorial” session (yes, I’m behind, so what?), I attended Wikipedia: Site Internals, Configuration and Code Examples, and Management Issues, presented by Domas Mituzas.

I have to say that my main interest at this conference is more on what’s being actively deployed and improved on in high-traffic production systems. Scalability is an area where theory interests me less than war stories.

Wikipedia’s story reminds me a lot of mailinator’s story. That is, Domas repeatedly emphasized that Wikipedia is free, is run mostly by volunteers, has no shareholders, and nobody’s going to get fired if the site goes down. Which means they can take some shortcuts and simplify their maintenance tasks with the right architectural designs, which may not scale as well as they’d like, but work anyway.

There were a lot of details here. Maybe too many. Any discussion is going to leave out at least a dozen interesting things. Here’s what I found interesting.

Data: 110 million revisions over 8 million pages. 26 million watch lists. So, not as large as Webshots, Flickr, Facebook, Photobucket, etc.

They utilize several layers of caching, from multiple Squid caches to app-level caches that reside on local disk. They also use UDP-based cache invalidation, and, in keeping with the theme, don’t care much if a few packets are dropped.

Their databases sit behind an LVS load balancer, which will take slaves out of service if replication falls behind. If all slaves are behind, the site is put into read-only mode.

Logged in users always bypass the Squid caches. Anonymous users who edit a page get a cookie set that also bypasses the Squid caches.

There was some discussion that page edits wait to ensure the slaves are caught up, but two direct questions from my colleague were sidestepped. So, my best guess is that either they’re relying on their load balancer’s slave status check, or they’re writing a sentinel value into another table within the same database then selecting that sentinel first thing after getting a connection to a slave.

They never issue ORDER BY clauses at the SQL level, even when paginating results. Instead, they rely on the natural ordering of their indexes and issue something akin to WHERE id > ? LIMIT ?. I don’t know how they handle jumping straight to the 500th page, but it seems a reasonable performance adjustment for many queries in the context of their application.

They’re still running MySQL 4.0.x, have no problems, and don’t plan to upgrade anytime soon.

I didn’t quite grasp their partitioning strategy. The 29-page book of notes he provided discusses various partitioning strategies more hypothetically, and more in terms of distributing reads or intensive tasks with indexes that reside on a subset of slaves.

Finally, revision histories are not stored in their own records, but are stored as compressed blobs, with each revision concatenated together uncompressed, then compressed. Makes a lot of sense to me.

My feeling is that, underneath, Wikipedia’s architecture strikes me as a bit overly complex for their size, as something that’s grown incrementally without the requisite resources to trim down some of the complexities. So, while their philosophy is: “simple, simple, simple, who cares if we’re down a few hours?” there still remains some cruft and relics of prior architectural decisions that they wouldn’t choose again if they were starting over. Which is great. It means they’re human after all.

Next page