Tuesday, December 6, 2011

What's in a name? A lot.

One letter identifier names have been getting bad press recently. The other day a post entitled Name Your Type Variables! got some discussion on the Haskell reddit. I agree with the author's value of good names, but I disagree with the specific examples he uses as well as his sweeping condemnation of one letter names. Since I am the primary author of one of his examples and have contributed code to his other example, I believe I am qualified to comment.

Here's the executive summary for the tl;dr crowd:

  1. Different naming scopes have different needs for names. Call the two extremes of this continuum rare and prevalent.

  2. Sometimes (especially with prevalent names that have large scope) there's too much context to communicate, so no name of reasonable length can be as descriptive as the one-letter critics seem to want.

  3. A general rule of thumb is that prevalent abstractions should be given shorter names and rare abstractions should be given longer names.

  4. Single letter names communicate patterns, long names communicate meaning. Both have their place.

Names are crucially important to writing readable, maintainable code. As I've said before, I consider this one of the most significant things I've learned over my programming career. I came to Haskell as a Java developer who had completely embraced the philosophy of using long, descriptive, unabbreviated identifiers. But writing code that is easy to read and maintain is about communication, and as any good writer will tell you, longer is not always better. A good name is one that clearly and concisely communicates the meaning of the abstraction it represents.

Like their corresponding language constructs, names have different scope. This means that they will have varying degrees of importance. A rare name used for an intermediate value deep in the internals of a library has much less impact on developers than a prevalent name used pervasively throughout a public API. I would like to suggest that longer, more descriptive names are more appropriate for rare concepts, while shorter abbreviated names are often appropriate for prevalent concepts.

First of all, it should be obvious that this argument is supported from a data compression standpoint. (The idea that this perspective is a useful axis in the space of code assessment metrics is argued nicely by Paul Graham in his essay Holding a Program in One's Head, so I will not go into it here.) Second, prevalent concepts will be closer to the reader's top-of-mind, so a well-chosen short name may be sufficient to jog the reader's memory and identify the concept. Third, prevalent concepts will generally require more involved explanations. It will be much less likely that a standalone name will fully communicate everything the reader needs to know about the concept. Why use a long name, when it still won't be enough to get the job done?

With this in mind, let's look at specific examples. Patrick mentions that the use of single letter type variables in Handler b v a contributed to slow progress in understanding the snaplet API we recently released in version 0.6 of the Snap Framework. If I were changing the code to satisfy his article, I would probably go with Handler base view a as the best verbose alternative. But would this really be helpful?

First of all, scan the above paragraph with your eyes. Which is more recognizable? I intentionally didn't use quotation marks or font distinctions. Handler b v a jumps right off the page at me, while Handler base view a blends in. This isn't necessarily a 100% valid point because the surrounding context is prose rather than code, but I think it does make the point that the shorter names are easier to scan. And of course, it makes perfect sense. There's less noise for the eye to sort through.

Secondly, if you saw the type Handler base view a, would that really aid a newcomer's understanding more than the single letter variant? I contend that in this case it would not. base could refer to a numerical radix, and view could have any number of meanings. Because they are more familiar to people, their meaning is more ambiguous. To help people more, we'd probably have to go to something like Handler baseSnaplet viewSnaplet a. But that's just starting to get noisy. It would give us the following type signature:

nestSnaplet :: ByteString
            -> (Lens viewSnapletA (Snaplet viewSnapletB))
            -> SnapletInit baseSnaplet viewSnapletB
            -> Initializer baseSnaplet viewSnapletA (Snaplet viewSnapletB)

which I find much less readable than the current one:

nestSnaplet :: ByteString
            -> (Lens v (Snaplet v1))
            -> SnapletInit b v1
            -> Initializer b v (Snaplet v1)

In the second type signature it's immediately obvious to the naked eye where the type variables are. But in the first one it's difficult to tell the long type variables apart from the type constructors. It might not be obvious, but I did put significant thought into the choice of b v a [1], and as a result, any time you see this general pattern of type variables, you instantly have a pretty good idea that you're dealing with a MonadSnaplet.

On top of all this, the new reader still probably wouldn't understand what viewSnaplet and baseSnaplet mean. There is a lot of context that needs to be communicated before someone can really understand them. We spent a lot of effort trying to make it as simple as possible, but I think the OP is confusing essential complexity of the problem domain with "ambiguity" allegedly caused by short names [2]. That is why the newcomer will find the convention clearly described in the API documentation.

The same argument can be applied to the other example Patrick mentioned of Form m i e v a in the digestive-functors library. It's a pervasive paradigm where that sequence of single letter type variables is very distinct and makes it easier to recognize patterns in the types.

In Haskell, type variables have an especially prevalent scope. As we have seen, they can embody patterns that exist across entire APIs. This makes them especially good candidates for single letter names. In a StackExchange discussion about why mathematicians use single letter variables one person commented, "Because it's long, it makes it hard to see patterns, and it makes you think about interpretation when you should be thinking about form." This concept is especially important in type signatures.

However, on the other end of the spectrum, I think it's extremely helpful to have more descriptive names for rare abstractions. For examples of me using longer names, see functions like getSnapletUserConfig or runChildrenWithText.

Good names make software more readable and maintainable. Long, descriptive names do serve a valuable purpose in making code more understandable. But we shouldn't just assume that they are always better. Sometimes, patterns can be more important than meaning, and maintainability is better achieved by explicit rather than implicit documentation. Just make sure that if you go this route, you don't neglect to inform the reader of the meaning behind the letters.


  1. As evidenced by the existence of this commit.

  2. As Albert Einstein said, "Make things as simple as possible, but not simpler."