October 19, 2008

Restlet: A REST ‘Framework’ for Java

Posted in Development, Software tagged , , , at 2:33 pm by mj

Building an API on the REST architectural style? Building it in Java?

This past week, on a project doing just that, I ran into Restlet. I’d never heard of a REST framework for Java before, but it’s been featured InfoQ, TSS, ONJava, and others over the past three years. (Damn, I need to pay more attention.)

And it kicks ass.

Here’s a quick run-down:

Restlet is an API for developing REST-based services in the same way that Servlet is an API for developing Web-based services. Your application never deals with the Servlet API, HTTP-specific attributes, cookies, sessions, JSPs, or any of that baggage.

Instead, you think and code in terms of REST: Routing, Resources, Representations.

It has an intuitive URL routing system that parses out resource identifiers and other data (that is, a URL template of /user/{user_id} would give you a ‘user_id’ attribute for any URL matching that pattern, which is fed into your User resource).

Resources are easily able to define which of the verbs (GET, POST, PUT, DELETE) they respond to, with default behavior defined for verbs that are unsupported.

There are plug-ins available for SSL and OAuth, an emerging best practice for authenticating third party access to user accounts.

The documentation is a bit lacking. However, there is an excellent IBM developerWorks tutorial on Restlet (registration required) that lays out pretty much everything you need, with a (nearly-)complete example for study.

September 20, 2008

Match-making for Java Strings

Posted in Software tagged , , , at 11:10 am by mj

(Inspired by Jeff Atwood’s recent ‘outing’ as a regex sympathizer, which got me thinking about the line between “too many” and “too few” regular expressions and how some languages make it a choice between “too few” and “none”.)

Java has a Pattern, which forces you to pre-declare your regex string.

And it has a Matcher, which matches on a String.

It should be noted that a Pattern‘s pattern turns most patterns into a mess of backslashes since the pattern is wrapped in a plain-old Java String.

So a Matcher has matches(), which matches using the Pattern‘s pattern, but only if the pattern would otherwise match the Matcher‘s whole string.

A Matcher can also find(), which matches a Pattern pattern even if the pattern would only match a substring of the String, which is what most patterns match and what most languages call matching on a string.

A Matcher can lookAt(), which matches on a Pattern pattern, which, like find(), can match a pattern on a substring of the string, but only if the String‘s matching substring starts at the beginning of the string.

The String matched by the Matcher can be sliced by a call to region(start,end), which allows matches() and lookAt() to interpret a substring of the String as being the whole string.

Now, after calling find() or any of Matcher‘s String-matching cousins, a consumer of a Matcher can call group(int) to get the String substring that the Matcher‘s Pattern‘s pattern captured when matching on the Matcher‘s String‘s string.

But if you’re lazy, and you have no groups in your pattern, and a Matcher‘s matches() is sufficient, then String gives you matches(pattern) which is precisely equivalent to constructing a Pattern with your pattern and passing a new Matcher your existing String!

So with effective use of Java object syntax, you too can use regular expressions to make your matches on Java Strings almost as obscurely as other languages clearly make matches on their strings!

Is it any wonder Java programmers don’t realize that regular expressions are a beautiful… thing?

October 21, 2007

Streaming MySQL Results in Java

Posted in Development, Software tagged , , , at 7:17 pm by mj

The MySQL C API provides two methods for fetching results:

preferred method, which stores all rows in a temporary buffer on the client
optional method, which gives the caller access to each row as it’s accessed through mysql_fetch_row() without first storing all rows in a temporary buffer

For example, when I need to process a huge number of results from a MySQL table–such as for offline data migration–I reach for the mysql command-line client with the --quick argument. This prevents the dreaded knock on the door from kswapd, and often is faster than doing data transformations in the database (when combined with a subsequent LOAD DATA LOCAL INFILE ....).

You can do the same thing with Perl’s DBD::mysql with something like $dbh->{’mysql_use_result’}=1. In PHP, you use the method mysql_unbuffered_query. I assume any library built on top of the native C API provides a similar step for setting this on a per-handle basis.

What I didn’t know was that Connector/J also supports streaming results. According to their documentation, you achieve this with the following:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);

But it turns out that com.mysql.jdbc.Connection#createStatement() uses TYPE_FORWARD_ONLY and CONCUR_READ_ONLY by default. All that’s left for the caller to do, then, is use the unintuitive “magic value” for fetchSize.

What you get back from executeQuery(query) is a ResultSet implementation with a backing store of type com.mysql.jdbc.RowDataDynamic, which reads the next row on each call to next().

This means it’s easy to achieve this even from within Hibernate using session.createSQLQuery. It’s unfortunate that you have to work outside the framework, but special cases like this are OK as far as I’m concerned. And if you’re processing a huge number of results, you probably don’t want all the reflection and serialization overhead of Hibernate, anyway. Update (2007-10-23): I was wrong. You can set the fetch size from a normal Hibernate Query, so no need to work outside the framework. What I said about the reflection overhead when processing a massive result set still goes, though.

Unfortunately, this doesn’t save you from the all GC overhead. Even if you’re careful to reuse your own buffers,
com.mysql.jdbc.MysqlIO#nextRow isn’t so careful. Each row allocates its own byte[][].

This is useful if, for example, you need to process tens of millions of rows from a table, but it’s incredibly expensive to generate the result set so batched fetches using LIMIT ofs,count won’t work. It’s also useful if you’re memory constrained on the client.

A couple of caveats

First, mysql_use_result() PREVENTS ANY OTHER CONCURRENT UPDATES. Even with InnoDB, this means each row in your result set will be locked until you completely process the results. You should only use this on a backend database designed for such queries, and then, you might consider writing your results into a temporary table first. Update (2007/10/23): The same is also apparently true whenever results are sent to the client; the caveat is just more important when streaming results because of the temptation/presumption of doing additional work, therefore taking longer to consume the results. I’ll have more on this later.

Second, each MySQL server has a net_write_timeout setting. Be sure you know what this value is. Default is 60 seconds, but it’s often set to 1 hour or more. You must process your results within this amount of time, or the gremlins will come in through your window.