November 4, 2007
Indirection Has a Price, Too
Robert C. Martin asks How big should a function be?.
His conclusion? Functions should be tiny:
Functions should be small. Very small. A well written function might be 4, 5, perhaps 8 lines long; but no longer.
Typically, the body of an if or while statement should be no more than a single function call.
The body of a try block should also be a function call; and the word try should be the first word in the function.
I’ve seen this principle applied too vigorously. To figure out in exactly what way a value of
2 for the
baz parameter affects the outcome of a function, you end up stepping into twelve other functions. Not fun.
As programmers, we need to recognize that any indirection has two costs (excluding performance considerations). First, there is the present cost to producing the code. Second, there is the future cost of understanding the code.
We’re accustomed to indirection. Object hierarchies, composition, configuration files, resource bundles, enumerations, … all introduce indirection, often with lofty goals like reducing coupling and isolating future changes.
What are the lofty goals behind keeping functions to 4-8 lines?
One goal is better unit testing. The more you decompose your function, the more individual pieces of it you can test in isolation, and the easier it is to figure out precisely what’s broken when a test fails.
Another goal is code reuse. If you have four high-level functions that do almost the same thing, but some criticial piece changes depending on the value of a parameter, they can each delegate most of their work to other functions.
But does that necessarily outweight the cost?
Consider this analogy. Some bloggers like to write entries that look like this:
Foo wonders if the Martian invasion may be good for humans. Then Bar responded eloquently with this look at Martian imperial history. Very Important Baz then took it a step further with Earth: the Final Frontier. I concur! What do you think?
Pop quiz: what is the thesis of that post?
You simply don’t know because of all the indirection!
Sure, it’s great if you have an hour to click on every link, and read the other entries, and, if you really care about the poster–because, say, you have a secret crush on this person–you’ll even tie it all together in the same way and leave a comment like “You’re absolutely right!”
Now, imagine a Wikipedia entry written in that way. Wikipedia has a policy against original research, and forces you into using secondary sources. So, why not do away entirely with summaries and excerpts?
Is it because humans just aren’t equipped to deal with indirection in that way?
So it is with code, because code is as much about communicating with other programmers as it is about driving the processor.
In my experience, functions around 20-25 lines usually hit the sweet spot. Such functions usually call at least two or three other functions, so they reasonably decompose distinct subtasks. Yet, they avoid unnecessary indirection, so a human reading the function can easily understand its corner cases, and perhaps has a better shot at simplifying it by spotting duplicated code, or introducing earlier returns/exception throwing.
My habit leans toward writing a single function, and decomposing into sub functions when (a) unit testing is significantly simplified or clarified, or (b) I need to reuse pieces in other contexts, or (c) I find that, in the end, it just looks messy and wouldn’t be something I’d want to come back to.
If we must talk about perfect function sizes as a function of lines of code, it’s best to distinguish types of functions. Data access functions should tend to be small. Input validation functions will call more sub functions. String parsing functions will tend to use a loop and only call sub functions if the processing is complex. Basic algorithms like searching and sorting will do all their processing inline. And so on.