C++ State of the Union
This article appeared in C++ Report, 10(1), January 1998.
As I write this, in the air on my way home from the July WG21/J16 meeting, I'm looking out the plane window at the distinctive patchwork quilt of the English countryside below. Small fields and farms fit together snugly, if sometimes a little haphazardly, as far as the eye can see. Even after so much work this past week, it still seems that the time has gone by quickly, and in a few hours my trip will be over.
As I write this, in the air on my way home from the July WG21/J16 meeting, I'm also scrolling through my browser window viewing the distinctive patchwork quilt of (Draft) Standard C++. The language and library fit together snugly, if sometimes a little haphazardly, as far as the eye can see. After so much work these past eight years, it still seems to some that the time has gone by quickly, and soon the standardization process will be over.
As with the result of any major effort, not everyone is satisfied with the standard. Interestingly, some people feel that the C++ standards committee has done too much and gone too far (for example, "why namespaces and STL?"), while a smaller group feels that it has not gone far enough in some things (for example, "why no module system or garbage collection?"). To better understand both viewpoints, we need to consider a bit of history and examine what standardization really is all about. After that, I'll summarize where C++ is today and where it is heading in the future.
The Essence of Standardization: "Codify" and "Specify"
A standards body has two important missions: to codify existing practice, and to specify new features. For example, a body that's standardizing bolts doesn't start by inventing the concept of bolts out of whole cloth, only to foist off its newborn creation on an unasked and unsuspecting world. (If it did, the standard would simply be ignored. This can and does happen, even to ISO standards.) Instead, the standards body is primarily made up of real-world manufacturers and users of bolts, who set standards for size and threading based on what has been found to work best in current applications and what new uses are anticipated in the future.
The C standardization effort is a good example of balancing these two missions. Starting with K&R C as its base document, the committee standardized what was known to work in existing practice. It also specified important new features that experience showed were needed but missing, particularly features that helped make C programs safer (including function prototypes and const, which were adopted from C++) and more portable (including wide characters and preprocessor details). The C standardization took seven years to finish, largely because of the addition of those new features. Was it worth the wait? Definitely yes, because the newly standardized features helped to make C code both more maintainable and more reliable.
Even so, a good standard is not always universally accepted. Today, ANSI C still struggles with K&R C for popularity in the UNIX community.
From the start, the C++ committee's mandate included specifying features like templates and exception handling that were "experimental" and incomplete in The Annotated C++ Reference Manual (ARM), the committee's base document. Today, the language's advanced template support and robust exception handling let us write strong libraries and better manage interdependencies within our software. I'll return to this important theme later.
C++ also badly needed a more complete standard library. For years, people had been writing incompatible classes to provide the same basic services -- notably strings, and containers like lists and vectors. This was bad because libraries from different sources often couldn't be used together without writing conversions between these basic services, and sometimes libraries from different sources couldn't be used together at all. To provide a standard string class, the committee at first created separate string and wstring classes, then generalized them into the basic_string template which provides common string manipulation and searching operations for both normal and wide characters. To provide standard containers, it first worked for some time to develop a set of vector-like classes (some of which still survive, including bitset and valarray). When a proposal emerged that was far superior to anything that had come before -- the Standard Template Library (STL) by Alexander Stepanov and Meng Lee -- the STL's generic containers, iterators, and algorithms were adopted and refined to their modern form.
Interestingly, the process of specifying and using these two pieces together -- the core language features, and the standard library -- helped to refine both far more quickly than either had progressed alone. For example, the adoption of the STL spurred heavier use of templates among library implementers and users who were trying to program in an "STL-like" way. This practical experience led to important insights about how templates could be better used and supported, which in turn led directly to the solid and flexible templates in the standard today. Because many features of the core language were first used in the specification of the standard library, many subtle problems were discovered even before implementation.
For more information about the history of C++ and the standardization process, I strongly recommend reading Stroustrup's The Design and Evolution of C++ (D&E), particularly Chapter 6.
"Too Far," or "Not Far Enough"?
This background should help to explain the existence of the two camps: those who feel the committee has done too much and gone too far, and those who feel it has done too little and not gone far enough. The former feel that standardization should be restricted mainly to the first mission, namely "codifying existing practice." They are (often legitimately) alarmed at new features, even when those features fill a compelling need. The main arguments from this camp are:
1. Adding features makes the standardization process take longer.
2. Adding features can make an already complex language more difficult.
3. Adding features increases the risk of making mistakes.
These arguments are all true. New features do take time: fortunately, if all remains on schedule then C++'s standardization will have taken only a year longer than C's (eight years vs. seven years), even though C++ is far more complex. Not all current features are necessary: The draft contains a few relics from pre-STL times that are now redundant but were never removed, such as valarray and bitset. Mistakes can be made: During the early years, the committee accepted many changes essentially because they were good ideas, not necessarily because they belonged in a standard. For the last several years, however, the committee has been much more frugal in accepting changes because it understands the arguments above.
What about those who feel that the committee didn't go far enough? Throughout the standardization process, the committee has been inundated with hundreds, if not thousands, of proposals for extensions or new features. Many of these proposals were well thought out, clearly presented, clearly workable, and ultimately rejected. Why? Because since 1994/1995 the committee's policy has been to reject any proposal unless it met two criteria: the proposal fixed something that was broken (meaning something fairly important that otherwise "didn't work at all" or "was very difficult to use"); and the problem couldn't be worked around within the existing language.
For example, one proposal that was accepted was to add the builtin bool type. Before bool was added, the committee examined many typedef/#define/enum/class workarounds, but none could produce the same effect as a builtin type. Further, bool's absence actually broke something: every programmer and library writer was already emulating it in his or her own incompatible way anyway, and that state of affairs was wreaking havoc on poor users who were attempting to use the incompatible libraries. Because it met both criteria, the proposal for the new type was approved.
As a counterexample, there have been several requests over the years to add regular-expression searching to strings. Was this really something that was broken and couldn't be worked around? No, because regular-expression searching is equally efficient and easy to code whether the operation is a member function standardized in the class or a free function provided by an outside library. Like many proposals for basic_string extensions, such a proposal should be rejected (and was, several times), even though it was frequently well thought out, clearly presented, and clearly workable. The current basic_string is already mildly cluttered because it contains more members than it should. Had all the requested extensions been allowed, basic_string would have turned into an incoherent monster class with literally hundreds of member functions, and hence one unlikely to be widely used.
Is there still some cleanup work the committee would like to do, given infinite time? Yes. For example, the standard algorithm find_end should really be named search_end for consistency, since it works like search and search_n rather than like find. Some minor misfeatures, such as the redundant valarray, could use revision or removal. I even have my personal pet peeves (for example, I think the vector<bool> specialization should not exist). But are these critical issues? No. We can write robust libraries and programs anyway. Even my own pet peeve will admittedly affect few programmers, if any. Issues like the ones I just mentioned are all important to someone, and many of them are good ideas, but the committee's approach in recent years has been properly frugal and conservative.
The Issue of Complexity
An oft-repeated criticism of C++ is that it is very complex. This is true. To get a balanced view, however, it's important to understand four things:
1. Complexity is bad when it One fundamental principle of C++ is that you don't pay for what you don't use, both in runtime performance and in code complexity. That goal has been largely realized: if you don't need some features today, such as multiple inheritance, internationalization, locales, or streams, you can usually ignore them. Still, some C++ complexities must be learned, including the basics of templates and exceptions (since they're widely used in the standard library, although a typical programmer doesn't need to learn details like partial specialization) and the style of pointer and memory management inherited from C.
2. Complexity that adds power is good. For example, C++ would be much less useful without advanced templates or exception handling. Ultimately, the greatest single reason why C++ is complex is because of its strong C compatibility, yet C++ has become one of the most popular programming languages in the world today largely because of its strong C compatibility.
3. Reusability reduces complexity. The standard library accounts for two thirds of the standard's page count because it contains many reusable features, such as standard containers and algorithms. Consider: How often have you written a binary search routine? How often have you hand-crafted a list container? Generic algorithms like lower_bound and generic containers like list are not only fully reusable, but they are fully portable and reduce the complexity of our own code.
4. Every language becomes more complex when standardized. Today, <insert your favorite alternative language here, including ARM C++> is a fine language and likely simpler than (Draft) Standard C++. To be a useful international standard, however, a language must support real-world platforms, internationalization, and other portability requirements, and that always adds complexity. For example, consider facets like moneypunct and see how they compare to similar facilities in other international computing standards.
C++ is indeed a complex language. This would be a serious flaw were it not for C++'s greatest complexity-reducer: the principle that you don't pay for what you don't use, and that often the complexity actually makes our lives easier.
What are the important results of C++ standardization? I could certainly cite a laundry list of individual benefits: it has solidified language features that were at first experimental (templates, exceptions), added useful features that were not already in existing practice (namespaces, run-time type identification (RTTI)), created important idioms (traits), and specified a standard library (the STL).
The standardization's most important contribution to C++ is stronger support for powerful abstraction to reduce software complexity. C++ is not solely an object-oriented language. It supports several programming styles, including both object-oriented programming and generic programming. These styles are fundamentally important because each provides flexible ways to organize code through abstraction. Object-oriented programming lets us bundle an object's state together with the functions that manipulate it, and encapsulation and inheritance let us manage interdependencies and make reuse cleaner and easier. Generic programming is a more recent style that lets us write functions and classes which operate on other functions and objects of unspecified, unrelated and unknown types, providing a unique way to reduce coupling and interdependencies within a program. A few other languages currently provide support for genericity, but none yet support it as strongly as C++, and indeed modern generic programming was made possible by the unique C++ formulation of templates.
Today C++ provides many powerful ways to express abstraction, and the resulting flexibility is the most important result of C++ standardization.
Where C++ Is Today
C++ is no longer a fluid and changing language. It has been generally fixed and stable since the summer of 1996, and as I write this C++ is on schedule to become an official ISO standard in the summer of 1998. As a result of this stability and the diminishing impact of changes, some of the most popular C++ compilers are nearly caught up with the current draft standard -- and this is probably the first time we've been able to say that since 1990!
The Final Committee Draft (FCD, also called CD2 because it was the second CD) was approved for release in November 1996, and since then the committee has made no substantive changes to the language except in response to comments from national standards bodies (such as ANSI, SCC, BSI and AFNOR). Some of the national bodies, such as ANSI, have had their own public comment periods during which anyone could write in and have his or her concerns considered by the committee, so no doubt many of you have had a chance to participate in the process since 1996 (or longer than that, if you've been following along since CD1). When voting on whether to approve a CD during the CD ballot, national bodies are allowed to vote "yes," "yes with comments," "no," or "no with comments." If the vote is "no with comments," the comments detail the changes that, if made, would change that vote to "yes."
There were a lot of national body comments for this CD, and a total of five "no with comments" votes out of the 22 voting national bodies. The high comment volume caused some people to speculate that maybe there was just too much work to be done this summer for things to stay on schedule. Unfortunately, some of that speculation made it into print. Fortunately, the speculation was wrong, and everything is exactly on schedule. The London meeting this past July was a resounding success: all of the U.S. public comments and all but three or four of the ISO member nation comments were resolved there.
The Road Ahead
Until this round of standardization is complete, the committee will continue to meet three times a year in March, July, and November. Here is the expected schedule for the near future, with the meeting locations shown in brackets:
November 1997 (Morristown, New Jersey, USA): This meeting will address the handful of remaining national body comments, and is expected to be mainly a week-long proofreading session. Barring catastrophes, the committee expects to complete the proposed Draft International Standard (DIS) draft and vote to submit it to ISO for approval as an official DIS.
January to March 1998: After the DIS draft is submitted by the committee, the ISO DIS ballot will take about two months. National standards bodies will vote to decide whether the DIS draft should be approved as an official ISO DIS. Unlike a CD ballot, during a DIS ballot the national bodies may only vote "yes" or "no": they may not make comments or propose changes, however minor. Assuming no unforeseen hurdles appear, C++ should reach official ISO DIS status in time for the March meeting in France.
March 1998 (Sophia Antipolis, France): The anticipated theme for this meeting is "Celebrate DIS success." DIS is the important hurdle; once it has been reached, the rest should follow. This meeting is expected to focus on any remaining editorial cleanup.
July 1998 (Rochester, New York, USA): This meeting will clear any more paperwork required to get the official name "ISO Standard C++" later in the summer.
August 1998: The final International Standard (IS) for ISO C++ is published.
After the IS is published, ISO rules require a five-year "cooling-off" period before another round of standardization can take place. This means that there will be much less work for the C++ committee to do until at least 2003, and so the committee will start meeting less often than three times a year (details have not yet been decided). It will still meet, however, since there will still be some work to do. In particular, if any problems with the standard are found after the IS has been produced, they will still be handled by the committee as Defect Reports to be appended to the standard.
Having a standard is vital. Standard C reduced incompatibilities between C implementations and provided tools to write more reusable and more portable code. Today, many popular C++ compilers have already nearly caught up with the current draft. The long journey does appear at last to be nearly over, and we already benefit from some of the best support available in any programming language for writing stronger and more maintainable software.
Thanks to several longtime committee members for sharing their recollections and providing comments. Particular thanks to Nathan Myers for his thorough and insightful reviews of several drafts of this article.
5. In summary: a typedef ... bool wouldn't allow overloading on bool; a #define bool wouldn't allow overloading either and would wreak the usual havoc of #defines; an enum bool would allow overloading but couldn't be automatically converted from a conditional expression (as in "b = (i == j);"); and a class bool would allow overloading but wouldn't let a bool object be tested in conditions (as in "if( b )") unless it provided an automatic conversion to something like int or void*, which would wreak the usual havoc of automatic conversions.
6. See also R.M. Martin, Designing Object-Oriented Applications Using the Booch Method (Prentice Hall, 1995). It contains an excellent discussion of why one of object-oriented programming's most important benefits is that it lets us reduce software complexity by managing code interdependencies.