GotW #86

Home Blog Talks Books & Articles Training & Consulting

On the
blog
RSS feed November 4: Other Concurrency Sessions at PDC
November 3
: PDC'09: Tutorial & Panel
October 26: Hoare on Testing
October 23
: Deprecating export Considered for ISO C++0x

This is the original GotW problem and solution substantially as posted to Usenet. See the book Exceptional C++ Style (Addison-Wesley, 2004) for the most current solution to this GotW issue. The solutions in the book have been revised and expanded since their initial appearance in GotW. The book versions also incorporate corrections, new material, and conformance to the final ANSI/ISO C++ standard (1998) and its Technical Corrigendum (2003).

Slight Typos? Graphic Language and Other Curiosities
Difficulty: 5 / 10

Sometimes even small and hard-to-see typos can accidentally have a significant effect on code. To illustrate how hard typos can be to see, and how easy phantom typos are to see accidentally even when they're not there, consider these examples.

Problem

Guru Questions

Answer the following questions without using a compiler.

1. What is the output of the following program on a standards-conforming C++ compiler?

#include <iostream>
#include <iomanip>

int main()
{
  int x = 1;
  for( int i = 0; i < 100; ++i );
    // What will the next line do? Increment???????????/
    ++x;
  std::cout << x << std::endl;
}

2. How many distinct errors should be reported when compiling the following code on a conforming C++ compiler?

struct X {
  static bool f( int* p )
  {
    return p && 0[p] and not p[1:>>p[2];
  };
};

Solution

1. What is the output of the following program on a standards-conforming C++ compiler?

#include <iostream>
#include <iomanip>

int main()
{
  int x = 1;
  for( int i = 0; i < 100; ++i );
    // What will the next line do? Increment???????????/
    ++x;
  std::cout << x << std::endl;
}

Assuming that there is no invisible whitespace at the end of the comment line, the output is "1".

There are two tricks here, one obvious and one less so.

First, consider the for loop line:

  for( int i = 0; i < 100; ++i );
                                ^

There's a semicolon at the end, a "curiously recurring typo pattern" that (usually accidentally) makes the body of the for loop just the empty statement. Even though the following lines may be indented, and may even have braces around them, they are not part of the body of the for loop. This was a deliberate red herring -- in this case, because of the next point, it doesn't matter that the for loop never repeats any statements because there's no increment statement to be repeated at all (even though there appears to be one). This brings us to the second point:

Second, consider the comment line. Did you notice that it ends oddly, with a "/"?

    // What will the next line do? Increment???????????/
                                                       ^

Nikolai Smirnov writes:

"Probably, what's happened in the program is obvious for you but I lost a couple of days debugging a big program where I made a similar error. I put a comment line ending with a lot of question marks accidentally releasing the 'Shift' key at the end. The result is unexpected trigraph sequence '??/' which was converted to '\' (phase 1) which was annihilated with the following '\n' (phase 2)." [1]

The "??/" sequence is converted to '\' which, at the end of a line, is a line-splicing directive (surprise!). In this case, it splices the following line "++x;" to the end of the comment line and thus makes the increment part of the comment. The increment is never executed.

Interestingly, if you look at the Gnu g++ documentation for the -Wtrigraphs command-line switch, you will encounter the following statement:

"Warnings are not given for trigraphs within comments, as they do not affect the meaning of the program." [2]

That may be true most of the time, but here we have a case in point -- from real-world code, no less -- where this expectation does not hold.

 

2. How many distinct errors should be reported when compiling the following code on a conforming C++ compiler?

struct X {
  static bool f( int* p )
  {
    return p && 0[p] and not p[1:>>p[2];
  };
};

The short answer is: Zero. This code is perfectly legal and standards-conforming (whether the author might have wanted it to be or not).

Let's consider in turn each of the expressions that might be questionable, and see why they're really okay:

bullet

0[p] is legal and is defined to have the same meaning as p[0]. In C (and C++), an expression of the form x[y], where one of x and y is a pointer type and the other is an integer value, always means *(x+y). In this case, 0[p] and p[0] have the same meaning  because they mean *(0+p) and *(p+0), respectively, which comes out to the same thing. For more details, see clause 6.5.2.1 in the C99 standard [3].

bullet

and and not are valid keywords that are alternative spellings of && and !, respectively.

bullet

:> is legal. It is a digraph for the ']' character, not a smiley (smileys are unsupported in the C++ language outside comment blocks, which is rather a shame). This turns the final part of the expression into p[1]>p[2].

bullet

The "extra" semicolon is allowed at the end of a function declaration.

Of course, it could well be that the colon ":" was a typo and the author really meant "p[1]>>p[2]", but even if it was a typo it's still (unfortunately, in that case) perfectly legal code.

 

Acknowledgements

Thanks to Nikolai Smirnov for contributing part of the Example 1 code; I added the for loop line.

 

References

[1] N. Smirnov, private communication.

[2] A Google search for "trigraphs within comments" yields this and several other interesting and/or amusing hits.

[3] ISO/IEC 9899:1999 (E), International Standard, Programming Languages -- C.

Copyright © 2009 Herb Sutter