Identifiers Are Comments

Consider this example from the MethodCommenting page.

Before:

  ++i;	/* record another match of this expression */

After:

  ++number_of_expression_matches;

As the example above demonstrates, a typical comment can be encoded in an identifier. Most programmers attempt to encode some information in an identifier that is useful in understanding what role the identifier serves in the code. However, this information could be completely misleading or even complete gibberish. If the programmer isn't vigilant when modifying code, identifiers with initially well chosen names could end up playing very different roles in the code while retaining their old names. This is exactly the same problem as when a comment isn't updated to reflect changes in the code it is documenting.


One of the tenets of ExtremeProgramming is that large methods should be broken up into small methods. This kind of code will have a lot more semantic annotation in the form of all these extra method names that code with longer methods won't have. If XP also stresses long method names, then even more annotation will be present. If identifiers really are comments, then practitioners of XP are in fact systematically documenting their code even though they officially eschew comments. Is this correct?


Hmmn - I agree that meaningful identifiers can obviate the need for various comments. But at what point does it become impractical to use a really long variable name? Something like the example above is pretty darn unwieldy to type if its a commonly used variable. Yes - we all know that programs are supposedly "write once; read many." But at some point, doesn't the length of the variable name and the inconvenience of typing it and the increased likelihood of mistyping it begin to outweigh its usefulness?

number_of_expression matches is very clear, but also very long and unwieldy. num_expr_matches is a tad shorter. How much less readable is it? What about num_matches? Where is the midpoint? (somewhere between 16 & 32 characters perhaps???) Or is it just a matter of having nice tools (like Emacs for example) that have auto-completion for your identifier.

I'm not trying to be facetious, this is a genuine question I have -- BradAppleton


To me, it is not just the name which communicates, but the name and its immediate context. At some point, the information in a long name becomes redundant given its context. It is like the difference between a bunch of rather short names which work together to convey a meaning and a bunch of long names which hit readers over the head like baseball bats. At some point, the readers of the latter will say, "Gee, the authors must think that we are stupid." -- MichaelFeathers

Names should be as meaningful as possible, but not necessarily long. I'd never abbreviate, if I weren't so lazy. "maxPay" is probably ok, but "maximumPayment" probably better. Readers who think that code writers are thinking about them when they write code are, in fact, stupid. We write this way because we are stupid. --RonJeffries

I don't think it's that we are stupid, more that we are forgetful. Just as we were entrenched and immersed in the muck and mire when we coded a particular section, we are often more distant from it when we revisit (and entrenched in something else ;-) So if we are thinking only of ourselves, we use names and comments that will trigger the requisite mental association to "refresh our mental state" and push the retrieved context onto our mental stack.

I think this is different (or at least *can* be different) from trying to teach or explain to first timers with our names and comments. The former presupposes a non-trivial degree of "clue" about the overall design/organization and tries to "recreate" a mental flow-state. The latter makes no such presupposition (or at least significantly less). Which is better? Who can say, it probably varies from shop-to-shop. -- BradAppleton


Ya know, every now and then someone around here starts a conversation to which I can actually contribute something. Makes me feel all glow-y.

CodeComplete, Steve McConnell's killer book on coding, offers the following interesting data point:

Gorla, Benander, and Benander found that the effort required to debug a COBOL program was minimized when variables had names that averaged 10 to 16 characters (1990).

Also, simply from my own experience, I would add that a very important convention for naming identifiers is the use of specialized sub-words with absolutely rigid meanings:

* XxxxxMin -- The absolute first element. * XxxxxFirst -- The relative (to this operation) first element. * XxxxxLast -- Opposite of First. * XxxxxMax -- Opposite of Min. * XxxxxLimit -- Like Last, except not a valid element, e.g. Last+1

I also use:

* XxxxxNdx -- as an iterator name. * XxxxxStart -- Synonym for First. * XxxxxCount -- Like Limit, with the assumption that First = Start+0

I do not use HungarianNotation in any way, though I do add a trailing underscore to member variables. -- MichaelHill


number_of_expression_matches - it is almost never good to call something "number" or "value". All integers are numbers; all variables are values. Use a more descriptive word, like "count".

I agree with Brad's comment about context. As an instance variable (of a regular expression class) I might call it "matchCount". In a function, I might call it just "count", because the function name would make it obvious what it was counting. One consequence is that variables (and methods) usually get renamed as they leave their context. -- DaveHarris

I think IndexOrCount deserves more discussion --JayBazuzi


This is a subject near to my heart. See http://www.oma.com/ottinger/Naming.html. As far as having long names, there are lots of ways to work with that. One is to use a good editor. If you use even something as primitive as vi (I use gvim a lot) you have an abbreviation command. You tell it "abb nexpr number_of_expressions", and when you type "nexpr" it expands the word. It is smart enough to wait to make sure the character after 'r' is really a punctuation or whitespace.

Surely the modern IDEs aren't so stupid that they can't do this? If you have an editor that insists on acting like a word processor (mouse-heavy, leave the home row constantly) you can type nexpr, then highlight the function and do a replace so it changes the short name to the long. Or, I suppose, there's the cut-n-paste.

So having more modern (and yet more primitive) tools shouldn't prevent us from having good identifier names. If worse comes to worse, you can always learn VI (gvim is great) or EMACS and you can easily do abbreviations and the like.

--TimOttinger

(PatternsOfSoftware discusses tools that do this: RichardGabriel seems to like the idea, but in this case might like the editor to also be able to show you the abbreviated form. -- MartinPool)


It may seem counter intuitive, but if you think about it a while you realize that code with many small methods has a greater amount of context with in each method. Each method has greater focus. With increased context comes shorter names. --DonWells

Given IdentifiersAreComments and what we know about comments, then: (1) use good identifiers named by intention, and (2) see if you can refactor rather than commenting. If you must use globals, give them good names, but better yet is to factor them out entirely. --MartinPool


Sometimes I go the other direction, like
  double Q = this->maximumPrimaryEnergyUsageForTheMonth();
  double P = this->currentPrice();
  return (P*Q)/pow(1-P*P, P/Q);

or
  Select C.name, A.balance
  From dbo.t_currentcustomer C, dbo.t_acctreceivables A
  Where
    C.id = A.customer_id

I'm leaning toward the adage: if "local variable names matter, your method is too long".

--JohnClonts


What about, instead:

  double energy = this->maximumPrimaryEnergyUsageForTheMonth();
  double price = this->currentPrice();
  return (price*energy)/pow(1-price*price, price/energy);

Hmm, interesting ... I thought I'd prefer the longer names ... but in this case I'm not sure I do. Hmmm. --

There is at least one undiscovered method in the last line. --

The name energy is poorly chosen. Consider using quantity or usage instead. --

Umm...you're advocating maximumPrimaryUsageUsageForTheMonth? :-)

No, I'm suggesting:

  double quantity = this->maximumPrimaryEnergyUsageForTheMonth();
  double price = this->currentPrice();
  return (price*quantity)/pow(1-price*price, price/quantity);

or:
  double usage = this->maximumPrimaryEnergyUsageForTheMonth();
  double price = this->currentPrice();
  return (price*usage)/pow(1-price*price, price/usage);


Over-long names are more of a problem for reading than for writing, so in my experience editor abbreviation-expansion tricks don't really help. I type pretty fast. -- DaveHarris
I'd like to see a language where we can have WhitespaceInIdentifiers?. I think this has potential for great improvements in readability and even writing speed.

For example, variable identifiers could be composed of one or more capitalized words; class identifiers of one or more all-uppercase words; and method identifiers of one or more all-lowercase words. If a generalized form of the Smalltalk message syntax was adopted -- one where parameters may be arbitrarily placed in the method identifier, e.g., "kill Evil Bear brutally with Stick" -- I think it could be pretty cool.

--DanielBrockman

In lisp, you can have those.


EditText of this page (last edited April 7, 2004)
FindPage by browsing or searching

This page mirrored in WikiPagesAboutRefactoring as of July 17, 2004