April 20, 2009

At the MySQL Conference 2009

Posted in Conferences tagged , at 11:26 pm by mj

I’m back at the MySQL Conference again. This year, I skipped the tutorials/workshops. And it’s a good thing, too, because I had a full day between my day jobs, tax issues and other matters.

You might be interested in my tentative schedule of sessions. Even after much editing, there are still conflicts. Not to mention MySQL Camp and Percona sessions upstairs.

This year I’ve decided my focus will be on MySQL performance and areas that could use some contributions. I need to get involved with the tools I use. I’ve also been looking to evaluate PostgreSQL more, and think a deeper understanding of many of the performance trouble spots that I’ve taken for granted will help.

In years past, I focused on my first love: hearing what other people are building, the scaling challenges they’ve faced, and their (sometimes novel) solutions. Which I just call “emerging best practices,” but it’s more than that.

This year I will not be going with co-workers, so I’m on my own mostly. I know of at least two former co-workers who will be at the conference, but many are skipping this year (read: their employers are struggling and don’t want to pay up). Only thing left is for me to show up naked.

Maybe this year I can avoid falling into the swimming pool.

Finally: this year, I have no pretense that I will be able to blog the sessions. Tried it two years in a row, didn’t work out so well. Besides, there are plenty of other sources for that.

My first challenge is going to be being up and dressed and ready to go by 7:30am. I’ve been working from home way too much…


February 14, 2009

Random thought from the “remorse is 20/2” department

Posted in Politics tagged , , at 10:19 pm by mj

Seven years ago, we wagged our fingers at the crazy internet boom, and said “never again.”

Today, we’re giving the finger to the crazy real estate/finance boom, and saying “never again.”

In ten years, will we do the same for the coming government extravaganza?

(N.B.: 20/2 is the visual acuity often ascribed to Hawks.)

January 18, 2009

Reasons to Avoid Mutual Funds

Posted in finance tagged , , , , , at 5:48 pm by mj

I’m finishing up David Swensen’s Unconventional Success: A Fundamental Approach to Personal Investment. This is part of my self-study aimed at becoming a better investor. (You may notice more finance/investing articles here and on my link blog. I’m finding it incredibly interesting.)

Swensen manages Yale University’s endowment, and is generally known as one of the top fund managers in the business.

Swensen advocates systematic portfolio rebalancing, which increases returns and decreases variance year-to-year. Unfortunately, the rebalancing activities required often run against our instincts, so few have the discipline for it. Even most mutual fund managers.

A large part of his book is devoted to explaining, in excruciating detail, and with more statistics than you could possibly process in a single read, why mutual funds are a bad investment.

Consider this small excerpt, comparing the offering of mutual funds to a standard, passively managed index fund (all emphasis is mine):

Fifteen-year results show a scant 5 percent probability of picking a winner. […] In a cruel twist of fate, for those skilled (or lucky) enough to identify a mutual-fund winner, the gain proves far more mediocre than the race track’s long-shot payoff, as the average winnings amount to a scant 1.5 percent per year. Fully 95 percent of active investors lose to the passive alternative, dropping 3.8 percent per annum to the Vaguard 500 Index Fund results.

And a bit later, discussing tax implications in non-tax-deferred accounts:

A miniscule 4 percent of funds produce market-beating after-tax results with a scant 0.6 percent margin of gain. The 96 percent of funds that fail to meet or beat the Vanguard 500 Index Fund lose by a wealth-destroying margin of 4.8 percent per annum. Arnott notes that “starting with an equal amount of money in 1984, fifteen years later an investor in the average losing fund would have roughly half the wealth that would have been amassed had the money been invested in the Vaguard 500 Index Fund.”

He goes on in his book to outline all the reasons: high sales (load) fees, commissions, marketing fees, hidden fees, poor/average managers, conflicts of interest, trend following, using skewed statistics as marketing pitches, etc. And those are just the obvious, legal reasons. The mutual fund industry is steeped in misdirection and outright fraud.

It strikes me as unfortunate that the most prominent form of investing for the average person–401(k) accounts–is driven by the mutual fund industry. What’s worse, the typical 401(k) account does not even have access to all mutual funds: it’s usually a pay-for-play affair.

So even if you identified the best mutual fund managers (1 in 20 probability), you’re completely at the mercy of whatever financial deal your employer has struck, and likely you have no choice but to invest in (at best) average funds. And all of this is endorsed at multiple levels of the government, not least of which the IRS.

Anyhow, I highly recommend this book. There are only two shortcomings that stick with me.

First, as a numbers guy, I wanted more numbers, more charts, and to see the data behind summaries. But he provides an excellent bibliography section, so I can always go look it up. (Might cost a bit, though.)

Second, he could have included 20 more pages worth of explanations. Sometimes he assumes you know more than you probably do (or maybe I’m just more ignorant than the average reader), requiring a re-read and visit to wikipedia. An extra paragraph here and there (no more than 2-3 pages per chapter) would have gone a long way.

But those are minor. Go buy it.

I will also read his previous work (which has a new edition now), Pioneering Portfolio Management, which, I suspect, will be even meatier.

He’s given me a lot to think about, and a lot to study. I will also perform simulations with what I’ve learned to better understand how things play out and what the downsides are, which I’ll try to incorporate into future posts.

January 13, 2009

Self-Deception and the Financial Meltdown

Posted in finance tagged , , , , , , at 6:18 pm by mj

This NYTimes article on Value-at-Risk (VaR) and the systematic masking of investment risk (use bugmenot for registration) provides the most coherent explanation I’ve yet read.

Essentially, it tells the story of the rise of Value-at-Risk (VaR), a metric that purports to ascertain how much money you’re likely to lose in the short-term.

Don’t miss how all the financial institutions exploited its known, even deliberate shortcomings to their advantage–by essentially structuring their investments so that any risk could be shoved off into the “1% probability” that VaR was specifically designed to ignore.

Or how VaR didn’t gain widespread acceptance until 1997, when the SEC gave VaR its “seal of approval.” Not just by saying “it’s OK,” but by forcing financial institutions to disclose a quantitative measure (and since there were no such competing measures around, VaR was the path of least resistance). And then, told those same financial institutions that it’s perfectly acceptable to rely on their own internal calculations, and not disclose the input.

Score one for good ol’ Uncle Sam and his Golden Regulators.

It’s a most excellent read, both for the explanation of how financial risk has been widely (mis-)measured in the past decade, and for the stunning example of how self-deception is such an integral part of the human condition.

January 10, 2009

Porn Industry Asking for TART Money

Posted in Politics tagged , , at 9:59 pm by mj

And why not?:

Joe Francis, creator of the “Girl’s Gone Wild” video series, and Larry Flynt, founder of Hustler, will ask Congress for a $5 billion bailout, according to TMZ.

Why does the porn industry need a bailout? Because apparently even porn is getting smacked by the recession.

XXX DVD sales have taken a hit – about a 22% hit, according to TMZ.

I’m sure Larry Flynt is making a political statement, and his cohort, Joe Francis (Girls Gone Wild, etc.) is just the sort of self-aggrandizing, profiteering nitwit to play the part and guarantee success.

I don’t know which would be sadder. If they’re actually granted the money, or if they’re not. Either way, the political statement will fly over everybody’s head.

December 19, 2008

How to Use Comments To Attract Visitors…And Make Money

Posted in Community, Software tagged , at 11:22 pm by mj

Obligatory disclaimer: I’m hardly a “typical case,” and I don’t have the resources to conduct usability studies, experiments, or surveys for these kinds of things (wouldn’t that be an interesting job).

Sometime in 2005, Steven Levitt and Stephen Dubner–authors of Freakonomics–started an excellent Freakonomics blog. It was more interesting than the book, honestly, and far less self-aggrandizing than the first edition.

By late 2007, it was bought by the New York Times, and soon after, they switched to partial feeds (after breaking their feeds entirely for a time, as I recall). I, of course, summarily removed it from my feed reader. Who has time for partial feeds, especially if you spend most of your time reading on the train, plane, or automobile (OK, rarely do I read blogs in an automobile, but trains and planes have easily constituted 75% of my blog reading time for the last several years)? And how can 30 characters or 30 words or whatever arbitrary cutoff point provide you enough information to decide whether you want to star it to read later?

This evening, I just happened to follow a link to a Freakonomics blog article (it was this story on standardized test answers followed from God plays dice, a math-oriented blog) and spent about 90 minutes on the site. It really does have interesting content. Most of that time was perusing some of the interesting comments, and that’s also what drove me to click through additional stories–hoping to find more interesting comments.

Here’s an example: A Career Option for Bernie Madoff?. I would never, ever guess that this would be worth reading, especially if it meant first starring the item in my reader, and then finding it and clicking it, and waiting for it to load and then render. Never. I doubt it was even intended to be interesting; it was more of a throw-away story submitted for the amusement of regular readers.

But I’ve found the discussion interesting, mostly because the (convicted felon) former CFO of Crazy Eddie was commenting.

Robert Scoble recently lamented a related shortcoming with blog presentation. Scoble wants a service that highlights individual commenters’ “social capital” so you know who’s talking out of their ass (such as me), and whose opinions really matter or might even hint at things to come.

The projects discussed in his comments all seem to be headed in the same direction: some kind of Javascript pop-up or other icon that tells you something about a comment author as you’re reading the comments (I’m convinced that in the future, we will only hear our friends and idols and important decision-makers; history really is cyclic). But what if you’re deciding whether to read the comments at all?

Slashdot is another example. I’ve been “reading” (skimming, really) slashdot for 10 years, since its first year of operation. I have never bothered creating an account. Aside from a handful of anonymous comments over the years, I’ve never really cared to participate in that community or discussions. It’s rarely my first source for news (unless I’ve been under a metaphorical rock for a few weeks); the summaries are usually wrong; and my opthamologist has attributed 28.3% of my retinal decay to the site design.

Yet I keep coming back–often weeks after a story was submitted–because of the interesting comments. But even if you go to the front page–sans feed reader–it’s a crap-shoot. Even with the community moderation system, you simply don’t know if there are great comments embedded within a story. So I sometimes click hoping for a definitive comment, and sort by score (highest first). That often yields good results, at least in a statistical sense: the more comments on a story, the more of them will be truly excellent (maybe because authors are trying to increase their karma by commenting on high-traffic stories?).

So here’s what I think can make us all happy, and make partial feeds useful too: find some way to incorporate the interesting-ness of comments on a story into the feed.

It’s not going to be enough to say “ten of your friends have commented on this story”–although that would be exciting, and doable with existing feed readers.

It’s not even going to be enough to say “two CEOs and three software engineers of companies whose stocks you own have commented.” That would be awesome, too.

It’ll have to combine “people who are interesting” with “comments that are interesting, regardless of the author.”

And dammit, if it wouldn’t revolutionize the way I, at least, read blogs.

If I had that, I’d have all the information I need to once again start skimming partial feeds. It’s even better than ratings on the story level, since it tells you so much more about how it’s engaging its audience.

Is there a market opportunity in there somewhere?

October 23, 2008

Cost of servers 20 years ago — request for help!

Posted in Software tagged , , , at 9:30 pm by mj

I’m putting together a presentation, and I need your help!

I’m looking for the cost and specs of typical “commodity” hardware 20 years ago, versus the cost and specs for typical “big guns” at the same time.

I’m also looking for database options other than the usual suspects (Oracle, DB2, Sybase) that may have been available at the time. Cost comparisons are ideal.

In other words… if you were transported back to 1988, and you had to support a large (for the time) data set, and still knew what you know now about scaling … would you have had any alternatives than (something akin to) Oracle on big iron?

Many thanks, and I’ll post a follow-up later.

October 19, 2008

Restlet: A REST ‘Framework’ for Java

Posted in Development, Software tagged , , , at 2:33 pm by mj

Building an API on the REST architectural style? Building it in Java?

This past week, on a project doing just that, I ran into Restlet. I’d never heard of a REST framework for Java before, but it’s been featured InfoQ, TSS, ONJava, and others over the past three years. (Damn, I need to pay more attention.)

And it kicks ass.

Here’s a quick run-down:

Restlet is an API for developing REST-based services in the same way that Servlet is an API for developing Web-based services. Your application never deals with the Servlet API, HTTP-specific attributes, cookies, sessions, JSPs, or any of that baggage.

Instead, you think and code in terms of REST: Routing, Resources, Representations.

It has an intuitive URL routing system that parses out resource identifiers and other data (that is, a URL template of /user/{user_id} would give you a ‘user_id’ attribute for any URL matching that pattern, which is fed into your User resource).

Resources are easily able to define which of the verbs (GET, POST, PUT, DELETE) they respond to, with default behavior defined for verbs that are unsupported.

There are plug-ins available for SSL and OAuth, an emerging best practice for authenticating third party access to user accounts.

The documentation is a bit lacking. However, there is an excellent IBM developerWorks tutorial on Restlet (registration required) that lays out pretty much everything you need, with a (nearly-)complete example for study.

September 27, 2008

Three subversion tips: svn:ignore, svn merge, svn move

Posted in Development, Software tagged , , at 7:57 am by mj

Since I complained earlier this year about the state of Subversion tools, I’ve been thinking about a follow-up that’s a bit more positive.

This doesn’t exactly count, but I thought I’d share a few productivity lessons I’ve learned recently.

Using svn:ignore
svn:ignore is a special subversion property that instructs Subversion to ignore any files (or directories) that match a given pattern.

The common use case is to ignore build artifacts to prevent accidental check-ins and eliminate clutter on svn status, etc. For example, you can ignore all *.jar files in a particular directory, or ignore your build directory, etc.

Unfortunately, this can tend to hide problems with your build artifacts. For a project I’m working on now, we have timestamped JAR files stuffed into a common directory. The JAR files themselves are svn:ignore‘d, which means svn status will never display them.

And as I found recently, this could result in 8 GB of “hidden” files that only becomes apparent when you, say, try to copy a remote workspace into a local one for managing with Eclipse.

Shame on the developers for not deleting them as part of ant clean. But it happens, no getting around that.

Thankfully, the Subversion developers thought about this case, and introduced the --no-ignore flag to svn status. With this option, ignored files are displayed along with added, modified and deleted files, with an I in the first column.

Cleaning up your subversion repository is, therefore, as simple as:

svn status --no-ignore |
grep -P '^I' |
perl -n -e '/^\I[\s\t]+(.*)$/; my $f=$1; if (-d $f) { print "Deleting directory $f\n"; `rm -rv "$f"`; } else { print "Deleting file $f\n"; `rm -v "$f"`; }'

That will remove all files and directories that Subversion is ignoring (but not files that just have not yet been added to source control). Stick that in a script in your path, and live happily ever after.


Merging back into trunk
The most common use case when merging is to provide a range of revisions in trunk to pull into your branch. For example:

svn merge -r 100:114 http://example.com/svn/myproject/trunk/

What happens is you tell Subversion, “I don’t care what happened before revision 100, because that’s already in my branch…so just apply changes between version 100 and 114.”

But what’s not obvious–nor, as far as I can tell, available in standard reference books–is how to merge back into trunk. It turns out, the way to do this is to disregard everything you’ve learned about subversion.

The problem is that you’ve been merging changes from trunk into your branch. So if you simply choose the naive approach of picking up all changes since your base branch revision until your final check-in, and try to apply those to trunk, you’ll get conflicts galore, even on files you never touched in your branch (except to pull from trunk).

The solution is to use a different form of the merge command, as so:

svn merge ./@115 http://example.com/svn/myproject/branches/mybranch/@115

where revision 115 represents your last merge from trunk.

This actually just compares the two repositories at the specified revision, and pulls in the differences, all the differences, and nothing but the differences. So help me Knuth.


Beware the power of svn move
One of the much-touted benefits of subversion (particularly as compared to CVS) is the support for moving files around. But, until 1.5, there has been a glaring error that is often overlooked and can get you into trouble.

Because svn move is implemented as a svn delete followed by a svn add, Subversion thinks the new file has no relation to the old file. Therefore, if you have local changes to foo, and your arch nemesisco-worker Randy moves it to bar, your changes will simply disappear!

Subversion 1.5 has partially addressed this, at least for single files. Under the new regime, your changes to foo will be merged with any changes to bar. However, you still need to be careful with moving directories.

This is more insidious than moving methods around inside the same file. While in that case Subversion will freak out and your merges will become difficult, at least you’ll see the conflict and your changes won’t disappear while you’re not looking.

The lesson, then, is to talk with your team-mates before any refactoring. (svn lock doesn’t seem to provide any help unless everybody’s in trunk.)

Rumor has it svn 1.6 will address this even more practically by introducing the svn fuck-you-and-your-dog command. But until then, you have to do it the old fashion way.

September 24, 2008

FDIC Insurance Myths & Sound Personal Banking Practices

Posted in finance tagged , , at 3:08 pm by mj

The economy is in the crapper. Banks are failing. The “full faith and credit of the United States government” is all people believe in. Which is scary, if you think about it.


Everybody’s concerned about FDIC insurance coverage, and graphs like this (via the WSJ via Paul Kedrosky) are sending people into fits:

From Paul:

The U.S. has a $6.881-trillion on deposit with banks, but only $4.241-trillion is insured. In the case of IndyMac something like $1-billion deposits was uninsured.

It seems this is one of those cases where subtleties are nearly impossible to communicate, because summaries of FDIC regulations are incomplete.

Hence why I’m writing this, hoping to do my part to help spread accurate information and reduce fear in my tiny part of the world.

First, let’s get this out of the way:

NEVER put all your money in a single bank.

The examples below are extreme cases. In addition to the possibility of bank failures or robberies, you also have to deal with compromised account numbers, being held at gun point, and so on.

A rule we like is a minimum three banks, and a minimum of two accounts that require going to the physical bank to access (e.g., CDs).

OK. So the rule everybody hears is “The FDIC insures you up to $100,000.” What they leave out is the multiple “ownership categories.”

The best source of information is the FDIC’s own introduction to FDIC insurance.

To quote:

Deposits maintained in different categories of legal ownership at the same bank can be separately insured. Therefore, it is possible to have deposits of more than $100,000 at one insured bank and still be fully insured.

Read that carefully. Then read the following pages that describe the eight ownership categories.

For example, use the FDIC Deposit Insurance Estimator to calculate your coverage under the following scenario at a single banking institution:

  • Bob & Alice have a joint savings account with $200,000 balance;
  • Bob has a single savings account with $100,000 balance; and
  • Alice has a single savings account with $100,000 balance

The result? Bob and Alice have $400,000 covered under the FDIC program.

How does that work?

Under FDIC rules, a combined savings account is split equally among all owners of the account, each of whom can be covered up to $100,000 in the “joint ownership” category.

And the “join ownership” category is independent from any coverage in the “single ownership” category.

This has other advantages, as well. If Alice were to get held at gun point, she could not, alone, wipe out their savings, because she does not have access to her husband’s money.

Similarly, if Bob were to get hit by a bus, Alice would immediately have access to 1/4 of their savings–even if there were some hold placed on the joint account (say, if Alice were being investigated because her best friend was driving the bus).

Also, if Bob and Alice were to get a divorce, they’d both be able to get by for a while–and amicably–even if there were a dispute about their shared property. And if they love each other now, it only makes sense they’d want to protect their partner in the event that things turn sour.

This is actually less than what’s possible. Add in a couple of IRAs ($250,000 each) and requited “payable-on-death” accounts, and it balloons to $1,100,000. Beyond that, and I believe you’re past typical personal banking needs. (Unfortunately, the EDIE tool doesn’t allow deep linking to the final reports.)

Given this, how is it that so much of the nation’s deposits are not insured? Too many single people? Too many rich idiots? Or are those graphs wrong and simply based on assuming any amount over $100,000 is uninsured?

My take-away is that the FDIC’s rules–which may seem a little troubling (why only $100,000?)–reinforce sound personal banking practices.

But more troubling for me is the possibility that the FDIC may not actually be financially prepared for what’s coming. From this article:

The total amount of losses to be covered is estimated to be as high as $8 billion. According to the FDIC 2007 Annual Report, the FDIC has only $53 billion to cover losses of this nature. If all the banks on the FDIC watch list were to fail, how much would it cost the FDIC? Does the FDIC have estimates calculated for this?

Of course the FDIC has calculated the estimates.

And, as with almost all institutions, they don’t have enough money.

So of course if all the banks on the list failed, they’d be in trouble, and so would we all.

But Bob and Alice have each other.

And they have their gold bullion investments.

And that secret stash of diamonds buried in their basement.

And that’s all that matters when all the world economies fail. Love. And diamonds.

What’s that you say? They should have buried gasoline instead? Dang. I guess they’ll just suffer, then. Poor Bob and Alice.

Next page