神刀安全网

A Guide to Naming Variables – Be Teutonic

Names Considered Useful

Software is written for people to understand; variable names should be chosen accorrdingly. People need to comb through your code and understand its intent in order to extend or fix it. Too often, variable names waste space and hinder comprehension. Even well-intentioned engineers often choose names that are, at best, only superficially useful. This document is meant to help engineers choose good variable names. It artificially focuses on code reviews because they expose most of the issues with bad variable names. There are, of course, other reasons to choose good variable names (such as improving code maintenance). This document is also a work-in-progress, please send me any constructive feedback you might have on how to improve it.

Why Name Variables?

The primary reason to give variables meaningful names is so that a human can understand them. Code written strictly for a computer could just as well have meaningless, auto-generated names:

int f1(int a1, Collection<Integer> a2) {   int a5 = 0;   for (int a3 = 0; a3 < a2.size() && a3 < a1; a3++) {     int a6 = a2.get(i1);     if (a6 >= 0) {       System.out.println(a6 + " ");     } else {       a5++;     }   }   System.out.println("/n");   return a5; }

All engineers would recognize the above code is needlessly difficult to understand, as it violates two common guidelines: 1) don’t abbreviate, and 2) give meaningful names. Perhaps surprisingly, these guidelines can be counter-productive. Abbreviation isn’t always bad, as will be discussed later. And meaningful is vague and subject to interpretation. Some engineers think it means that names should always be verbose (such as MultiDictionaryLanguageProcessorOutput ). Others find the prospect of coming up with truly meaningful names daunting, and give up before putting in much effort. Thus, even when trying to follow the above two rules, a coder might write:

int processElements(int numResults, Collection<Integer> collection) {   int result = 0;   for (int count = 0; count < collection.size() && count < numResults; count++) {     int num = collection.get(count);     if (num >= 0) {       System.out.println(num + " ");     } else {       result++;     }   }   System.out.println("/n");   return result; }

Reviewers could, with effort, understand the above code more easily than the first example. The variable names are accurate and readable. But they’re unhelpful and waste space, because:

processElements
most code "processes" things (after all, code runs on a "processor"), so process is seven wasted characters that mean nothing more that "compute". Elements isn’t much better. While suggestive that the function is going to operate on the collection, that much was already obvious . There’s even a bug in the code that this name doesn’t help the reader spot.
numResults
most code produces "results" (eventually); so, as with process , Results is seven wasted characters. The full variable name, numResults is suggestive that it might be intended to limit the amount of output, but is vague enough to impose a mental tax on the reader.
collection
wastes space; it’s obvious that it’s a collection because the previous tokens were Collection<Integer> .
num
simply recapitulates the type of the object ( int )
result , count
are coding cliches; as with numResults they waste space and are so generic they don’t help the reader understand the code.

However, keep in mind the true purpose of variable names: the reader is trying to understand the code, which requires both of the following:

  1. What was the coder’s intent ?
  2. What does the code actually do?

To see how the longer variable names that this example used are actually a mental tax on the reader, here’s a re-write of the function showing what meaning a reader would actually glean from those names:

int doSomethingWithCollectionElements(int numberOfResults,                                        Collection<Integer> integerCollection) {   int resultToReturn = 0;   for (int variableThatCountsUp = 0;         variableThatCountsUp < integerCollection.size()           && variableThatCountsUp < numberOfResults;         variableThatCountsUp++) {     int integerFromCollection = integerCollection.get(count);     if (integerFromCollection >= 0) {       System.out.println(integerFromCollection + " ");     } else {       resultToReturn++;     }   }   System.out.println("/n");   return resultToReturn; }

The naming changes have almost made the code worse than the auto-generated names, which, at least, were short. This rewrite shows that coder’s intent is still mysterious, and there are now more characters for the reader to scan. Code reviewers review a lot of code; poor names make a hard job even harder. How do we make code reviewing less taxing?

On Code Reviews

There are two taxes on code reviewers’ mental endurance: distance and boilerplate . Distance, in the case of variables, refers to how far away a reviewer has to scan, visually, in order to remind themselves what a variable does. Reviewers lack the context that coders had in mind when they wrote the code; reviewers must reconstruct that context on the fly. Reviewers need to do this quickly; it isn’t worth spending as much time reviewing code as it took to write it. Good variable names eliminate the problem of distance because they remind the reviewer of their purpose. That way they don’t have to scan back to an earlier part of the code.

The other tax is boilerplate . Code is often doing something complicated; it was written by someone else; reviewers are often context-switching from their own code; they review a lot of code, every day, and may have been reviewing code for many years. Given all this, reviewers struggle to maintain focus during code reviews. Thus, every useless character drains the effectiveness of code reviewing. In any one small example, it’s not a big deal for code to be unclear. Code reviewers can figure out what almost any code does, given enough time and energy (perhaps with some follow-up questions to the coder). But they can’t afford to do that over and over again, year in and year out. It’s death by 1,000 cuts.

A Good Example

So, to communicate intent to the code reviewer, with a minimum of characters, the coder could rewrite the code as follows:

int printFirstNPositive(int maxToPrint, Collection<Integer> c) {   int skipped = 0;   for (int i = 0; i < c.size() && i < maxToPrint; i++) {     int n = c.get(i);     if (n >= 0) {       System.out.println(n + " ");     } else {       skipped++;     }   }   System.out.println("/n");   return skipped; }

Let’s analyze each variable name change to see why they make the code easier to read and understand:

printFirstNPositive
unlike processElements , it’s now clear what the coder intended this function to do (and there’s a fighting chance of noticing a bug)
maxToPrint
unlike numResults , which was rather vague, this communicates intent and makes it easier to spot another bug
c
collection wasn’t worth the mental tax it imposed, so at least trim it by 9 characters to save the reader the mental tax of scanning boilerplate characters; since the function is short, and there’s only one collection involved, it’s easy to remember that c is a collection of integers
skipped
unlike results , now self-documents (without a comment) what the return value is supposed to be. Since this is a short function, and the declaration of skipped as an int is plain to see, calling it numSkipped would have just wasted 3 characters
i
iterating through a for loop using i is a well-established idiom that everyone instantly understands. Give that count was useless anyway, i is preferable since it saves 4 characters.
n
num just meant the same thing int did, so since the variable is only even used on two lines right next to each other, might as well make it as short as possible; there’s no chance the reader will fail to remember that n is the number that just came out of the collection.

It’s also easier, now, to see there are two bugs in the code. In the original version of the code, it wasn’t clear if that the coder intended to only print positive integers. Now the reader can notice that there’s a bug, because zero isn’t positive (so n should be greater than 0, not greater-than-or-equals). (There should also be unit tests). Furthermore, because the first argument is now called maxToPrint (as opposed to, say, maxToConsider ), it’s clear the function won’t always print enough elements if there are any non-positive integers in the collection. Rewriting the function correctly is left as an exercise for the reader.

Naming Tenets (Unless You Know Better Ones)

  • As coders our job is to communicate to human readers , not computers.
  • Don’t make me think . Names should communicate the coder’s intent so the reader doesn’t have to try to figure it out.
  • Code reviews are essential but mentally taxing . Boilerplate must be minimized, because it drains reviewers’ ability to concentrate on the code.
  • We prefer good names over comments but can’t replace all comments.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » A Guide to Naming Variables – Be Teutonic

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址