On Commenting Code

Today I received the following question:

I'm reading Uncle Bob's Clean Code at the moment and he has a section in there about very rarely using commenting.

His thoughts are that if classes/variables/functions are named appropriately there should be no need for commenting and that they are just noise.

I've been quite bad in the past at over-commenting and I've just mentioned it to the team lead in work and he is of the opinion it is good practice to be commenting functions etc.

Uncle Bob says if you get to the point where you are going to add a comment, you should refactor the code.

What are your thoughts and what have you found has been the attitude in your experience?

I've had many discussions about the use of comments when putting together coding standards and an architecture. What follows is my response.

Guidelines for Commenting Code

This is a great quote from the book Code Complete 2:

Comments are easier to write poorly than well, and commenting can be more damaging than helpful.

Self documenting code is definitely a good thing. Comments are a good thing. Communicating the reasons for the code is a good thing. But what is the right balance?

In my experience, debates about usage of comments often fall into two categories:

  1. Should we use Class, Property and Method Header Comments?
  2. Should we allow inline comments?

Let's examine each category in a little more detail.

Should I use Class, Property and Method Header Comments?

Those who have worked for long enough with statically typed languages such as C# or Java would have no doubt encountered code that is commented in the following style:

    /// <summary>
    /// A user with the admin system.
    /// </summary>
    public sealed class User
    {
        /// <summary>
        /// The user repository.
        /// </summary>
        private UserRepository _repository;

        /// <summary>
        /// Gets or sets the ID.
        /// </summary>
        /// <value>
        /// The ID.
        /// </value>
        public Guid ID { get; set; }

        /// <summary>
        /// Gets or sets the username.
        /// </summary>
        /// <value>
        /// The username.
        /// </value>
        public string Username { get; set; }

        /// <summary>
        /// The find by username.
        /// </summary>
        /// <param name="username">
        /// The username.
        /// </param>
        /// <returns>
        /// The <see cref="User"/>.
        /// </returns>
        public Use GetUserById(int id)
        {
           return _repository.FindUserById(id);
        }
    }

In many cases, such rules are enforced by build tools like StyleCop and Checkstyle. Let's take a look at this code without the comments:

    public sealed class User
    {
        private UserRepository _repository;

        public Guid ID { get; set; }

        public string Username { get; set; }

        public Use GetUserById(int id)
        {
           return _repository.FindUserById(id);
        }
    }

As illustrated in this example, comments add little value in conveying the meaning of well named classes, properties and methods. Careful consideration needs to be taken when enforcing commenting rules within a code base. On more than one occasion I've seen an entire development team making use of auto commenting tools to generate generic comments based on the class/property/method names, which is a waste of everyones time.

Creating 3rd party libraries or a requirement to generate documentation are valid reasons to enforce such rules, otherwise approach with caution.

Should we allow inline comments?

Inline comments are acceptable, but how do we ensure we are writing them well and are not polluting our codebase?

Talk is cheap. Show me the code.

Here's an un-refactored version of a TDD Kata in Node JS. The type of comments included here are typical of the type of comments I've seen and written in my career to date:

exports.add = function (numbers) {

    if (numbers) {

        // if the string starts with //, it means we are using a custom delimiter
        if (numbers.indexOf("//") === 0) {
            // string starts with // and delimiter can only be 1 char long, so take 3rd character.
            var delim = numbers.substring(2, 3);

            // get the remaining number list with the custom prefix removed
            numbers = numbers.substring(4);

            // replace the custom delimiters with the stanard delimiter of a ,
            while (numbers.indexOf(delim) != -1) {
                numbers = numbers.replace(delim, ",");
            }
        };

        var sum = 0,
            tokens = [],
            negatives = [];

        // split the tokens based on the delimiter
        tokens = numbers.split(/[,\n]/);

        for (var i = 0; i < tokens.length; i++) {
            var number = parseInt(tokens[i]);

            if (number >= 0) {
                // if the number is positve, add it
                sum += number;
            } else {
                // if the number is negative, add it to list
                negatives.push(number);
            }
        } 

        if (negatives.length > 0) {
            // negative numbers should result in an exception
            throw "negatives not allowed " + negatives.join(",");
        }

        return sum;
    }

    // return zero for an empty string
    return 0;
};

So, how would we refactor this if we were to follow the principle of refactoring code where comments have been added? Well I guess first we should add some tests, right? ;-)

Here are some basic tests, for those who may want to play along:

var should = require("should"),
 calculator = require("../calculator");

describe("string calculator", function () {
 it("should return zero for an empty string", function () {
  calculator.add("").should.equal(0);
 });

 it("should return the number for a single number", function () {
  calculator.add("1").should.equal(1);
  calculator.add("2").should.equal(2);
 });

 it("should return the sum for multiple numbers", function () {
  calculator.add("1,2").should.equal(3);
  calculator.add("1,2,3").should.equal(6);
  calculator.add("1,2,3,4").should.equal(10);
 });

 it("should return the sum with new line as delimiter", function () {
  calculator.add("1\n2").should.equal(3);
  calculator.add("1\n2\n3").should.equal(6);
 });

 it("should return the sum with custom delimiter", function () {
  calculator.add("//;\n1;2").should.equal(3);
  calculator.add("//*\n1*2*3").should.equal(6);
 });

 it("should throw exception for negatives", function () {
  (function() {
   calculator.add("-1");
  }).should.throw();
 });
});

If we follow this advice to the letter, we might end up with something as follows:

function startsWithCustomDelimiter (numbers) {
    return numbers.indexOf("//") === 0;
  }

function extractCustomDelimiter (numbers) {
    // string starts with // and delimiter can only be 1 char long
    return numbers.substring(2, 3);
}

function removeCustomDelimiterPrefix (numbers) {
    return numbers.substring(4);
}

function standardiseDelimiters (numbers, customDelim, standardDelim) {
    while (numbers.indexOf(customDelim) !== -1) {
        numbers = numbers.replace(customDelim, standardDelim);
    }

    return numbers;
}

function replaceCustomDelimiters (numbers) {
    if (startsWithCustomDelimiter(numbers)) {
      numbers = standardiseDelimiters(removeCustomDelimiterPrefix(numbers), extractCustomDelimiter(numbers), ",");
    }

    return numbers;
 }

function extractIndividualNumbers (numbers) {
    numbers = numbers.split(/[,\n]/);
    for (var i = 0; i < numbers.length; i++) {
      numbers[i] = parseInt(numbers[i]);
    }

    return numbers;
  }

function throwExceptionForNegatives (numbers) {

    var negatives = [];

    for (var i = 0; i < numbers.length; i++) {

      if (numbers[i] < 0) {
        negatives.push(numbers[i]);
      }
    }

    if (negatives.length > 0) {
      throw "negatives not allowed " + negatives.join(",");
    }
  }

function sumNumbers (numbers) {
    var sum = 0;
   
    for (var i = 0; i < numbers.length; i++) {
      sum += numbers[i];
    }

    return sum;
  }

exports.add = function (numberString) {

    // default for null is zero
    if (!numberString) return 0;

    numberString = replaceCustomDelimiters(numberString);

    var numbers = extractIndividualNumbers(numberString);

    throwExceptionForNegatives(numbers);

    return sumNumbers(numbers);
  };

The comments that remain couldn't be removed by way of adding a method because the comments didn't describe "what" the code was doing, they described "why". This is an indication that the remaining comments were more suitable than the ones removed.

If I do write comments I write them as if they could begin with the word "because". The developer reading a method in this style would see the pattern: What [because] Why

What: should come from method name.
Why: from comment if not obvious.

If the "what" (method name) is descriptive enough, the "why" should be obvious thus no need for a comment.

Taking a method from our example:

function extractCustomDelimiter (numbers) {
    // string starts with // and delimiter can only be 1 char long
    return numbers.substring(2, 3);
}

The aim of this method name and comment is to convey: extract custom delimiter [because] string starts with // and delimiter can only be 1 char long.

A sign of a comment related code smell is a method that has more than one "what" style comment within. Another code smell that normally accommodates comment pollution is too many lines of code within a method. Anything over 10 lines of code (excluding basic property setters) in length should be scrutinised. I don't follow this rule blindly but it was inspired by this extract from Code Complete 2:

The number "7±2" has been found to be a number of discrete items a person can remember while performing other tasks.

In other words, how many pieces of code can a developer remember whilst dissecting a class or method?

The side effect of the initial refactoring is too many methods. 9 methods is quite a lot for such a simple example and we are near the upper threshold of the "7±2" number of discrete items. There is little value in breaking out a one liner such as return numbers.substring(2, 3); into its own method.

Refactoring a little further attempts to balance this. Here is the end result:

var utils = require("./utils").utils();

function standardiseDelimiters (numbers) {
    numbers = standardiseCustomDelimiters(numbers);
    return numbers.replaceAll("\n", ",");
}

function standardiseCustomDelimiters (numbers) {

    if (numbers.startsWith("//")) {

      var segments = numbers.split("\n"),
          customPrefix = segments[0],
          numbers = segments[1];

      var customDelim = customPrefix.replace("//", "");

      numbers = numbers.replaceAll(customDelim, ",");
    }

    return numbers;
 }

function extractIntegers (numbers) {
    
    numbers = numbers.split(",");

    for (var i = 0; i < numbers.length; i++) {
      numbers[i] = parseInt(numbers[i]);
    }

    return numbers;
  }

function checkForNegatives (numbers) {

    var negatives = [];

    for (var i = 0; i < numbers.length; i++) {
      if (numbers[i] < 0) negatives.push(numbers[i]);
    }

    if (negatives.any()) {
      throw "negatives not allowed " + negatives.join(",");
    }
  }

exports.add = function (numbers) {

    var DEFAULT_VALUE = 0;

    if (numbers) {
      numbers = standardiseDelimiters(numbers);
      numbers = extractIntegers(numbers);
      checkForNegatives(numbers);

      return numbers.sum();
    }

    return DEFAULT_VALUE;
  };

Much better, don't you think?

Key changes in this final refactoring:

  • Due to JavaScript being a very verbose language I made use of JavaScript prototypes to make code more succinct in places by adding generic functions for common scenarios e.g. startsWith(), any(), replaceAll()
  • I eliminated a comment by using a constant (for the default zero value)
  • I balanced the extra methods by scrutinising method names and variables e.g. the standardiseCustomDelimiters method

How do you comment?

Share your thoughts on your approach to code comments.

You can see the code with the object extensions here. NB this is just test code for the blog post.

2 comments:

  1. Hi Brad,

    Great post! I can relate to a lot of this.

    The initial code example you gave is exactly the type of code I have found myself writing. I like to be able to "read" the code as English as much as possible hence almost given a running commentary of what's going on.

    I'm currently on a mission to improve the code I write so am doing a lot of reading and this is a subject that particularly touched a nerve as I find myself writing a lot of comments within my code, much of which as per your property comments above, add absolutely NO VALUE!

    When bringing this up in my current contract I've been told that it's "good practice" to write comments which I think is an argument that can be countered using your points above in addition to the things I've been reading, in particular Clean Code by Uncle Bob Martin.

    A few other things to consider with comments is that they often become outdated. The whys or business rules tend to change over time, this isn't always reflected in the comments which end up giving the developer a false sense of security.

    Overall I think I'm going to drastically trim down the comments I write and instead concentrate on a good level of code refactoring. I'm generally going to work on the rule that if I am about to write a comment, I need to review my variable, function and class names as it is likely these could do the job for me!

    Great post, keep up the good work!!

    ReplyDelete
    Replies
    1. You raise a great point I overlooked in this post which is the possibility of comments becoming out of date! That's probably one of the most important reasons not to comment too much!!

      Delete