Toward a Standard C++0x Library, Part 1

On the
blog

November 4: Other Concurrency Sessions at PDC
November 3: PDC'09: Tutorial & Panel

October 26: Hoare on Testing
October 23: Deprecating export Considered for ISO C++0x

Toward a Standard C++0x Library, Part 1

This article appeared in C/C++ Users Journal, 20(1), January 2002.

I’m writing this just after the Redmond standards meeting in October 2001. The ISO/ANSI C++ standards committee has now begun actively soliciting and considering new features that will eventually go into the next version of the C++ standard, which is colloquially being called C++0x (in contrast to the existing standard, C++98). For details on how you can participate, see Matt Austern’s column elsewhere in this issue. At the moment we’re concentrating primarily on new library features, with little or no change to the existing C++98 core language. The more-distant core changes are also on the radar screen, but still out at the edges.

Beyond the selection of individual features, however, lurks a more fundamental issue: What about migration? Exactly how should the C++0x standard library be distinguished from the C++98 library? Should it be in a different namespace? What kinds or categories of additions, deletions, and changes might we want to be able to make, and what’s the best way to go about making them? Note that third-party C++ libraries face the same challenges, and the choices made in a new “release” of the standard library ought to be exemplary, demonstrating a method that other libraries should be encouraged to follow in their new releases. No pressure!

You should care about this, because at the end of the day how we end up versioning or migrating the standard library affects you directly. In case you think it doesn’t, consider just one example: Does the compiler you use today, three years after the standard was passed and four since it was cast in stone, still support both classic and modern templated iostreams, in the name of backward compatibility? Did you ever encounter problems because these facilities had the same name but different specifications in different drafts of the standard? Food for thought, the more so since there are also still prestandard string and auto_ptr implementations out there. This is one example of the kind of potential problems we would like to avoid creating again.

Here’s a sampling intended to give you a taste for the kinds of common cases that need to be considered, and why they’re not always quite as simple as they seem. None of these are necessarily things the committee is, or is not, going to do, although I picked them because they’re likely examples.

Adding New Stuff

Here’s the simplest case: Adding something completely new to the standard library that the C++ community has never seen before.

// Example 1: Adding something new.
// How hard can it be?
//
namespace std
{
template<class In, class Out, class Pred>
Out copy_if(In first, In last, Out result, Pred p);
}

Example 1’s copy_if is a good example of something simple and obvious that we almost certainly want to add. That it’s not already in the standard is pretty much just an oversight. For more details on copy_if, including to write your own correctly if you want to, see Item 36 in [1].)

“Oh, that’s easy,” some might think, and wave dismissively while sipping their lattes. After all, adding something new that wasn’t in C++98 does look like the easiest possible case, doesn’t it? And indeed for copy_if it’s pretty easy. But if you’re inclined to think that it’s always so easy, consider this:

// Example 2: Adding something new.
// How hard it can be!
//
namespace std
{
template< /* ... */ >
class slist { /* ... */ };
template< /* ... */ >
class hash_set { /* ... */ };
template< /* ... */ >
class hash_multiset { /* ... */ };
template< /* ... */ >
class hash_map { /* ... */ };
template< /* ... */ >
class hash_multimap { /* ... */ };
}

“But that’s still easy,” some may protest. “Those are commonly provided extensions, and commonly used names, but there’s no migration issue here — those names can’t collide with anything in C++98 because they aren’t already in namespace std!”

And that, as they say, is where the detritus meets the rotating ventilation device.

Although C++98 doesn’t put such names into namespace std, more than one standard library implementation does precisely that as an extension. Whether the standard actually does or should permit library implementations to do that is at the moment the subject of a bit of a vigorous debate, and there’s a fairly strong feeling that allowing it is not a good idea, but it’s also a debate that’s mostly beside the point: The point is that real-world implementations exist that already provide those names in std, that the implementations already vary (which is part of the reason for wanting to standardize such things in the first place), and therefore whatever the committee standardizes will not necessarily be exactly what those implementations already provide (even if we were to pick one of the alternative existing implementations, we’d be specifying something that’s different from the other ones). We have to figure out the best way to live with that to best protect the programmers out there in the trenches who are already using such implementations.

To see how thorny this can get, consider a few major alternatives:

Option 1: Standardize existing practice: Use the common names in std. This is an obvious alternative, and it has the advantage of standardizing what are indeed the well-known names already being used. But it also means that users of standard library implementations that already use those names, expecting different semantics, will have their programs that use those names either stop working or silently change meaning if nothing is done. So, for practical purposes, the best that those standard library implementations can probably do is provide some sort of switch for choosing between the old and the new versions, bending over backwards to avoid ODR (one definition rule) violations, and over time migrate their user base to the new versions. (Which will take years. Classic vs. templated iostreams, anyone?)

So Option 1, while obvious and aesthetically pleasing to many, still screeches a bit when the rubber hits the real-world road: Affected users will for years have two versions of the facility with exactly the same name but different semantics (and have to switch with compiler flags or something). Ugh.

Option 2: Tweak existing practice: Use variant names in std. The next-most-obvious alternative, and indeed the one that initially got a good reaction in the Library Working Group on the face of it, was to use a slightly different name for the problematic extensions. After all, if the problem with Option 1 is the unfortunate naming situation, then using different names might help. For example, instead of standardizing hash-based maps using the name hash_map, we could call it hashed_map (which is arguably more accurate anyway), hashmap, hmap, map_hashed, or something else.

Did you spot the parallel issue? Alas, Option 2, although an obvious workaround that can be a tempting compromise on the face of it, also makes rude noises when the rubber hits the real-world road: Affected users will for years have two versions of the facility with similar names but different semantics. I’d rather not be the one to repeatedly answer the new frequently-asked-question “What’s the difference between hash_map and hashmap?” although I’m sure that people who like to run obfuscated C++ programming contents, as well as those who enjoy ridiculing C++ for its complexity, would have a heyday with it. It’s arguable whether this is better or worse than Option 1.

Option 3: Specialize behavior of the existing map to provide hashing through the existing names. For example, we could add “hashing” predicates and partially specialize map on such predicates. Here’s a sketch of the idea (details could vary):

namespace std
{
// Base template:
// Tree-based semantics.
//
template
<
class Key,
class T,
class Compare = less<Key>,
class Alloc = allocator<pair<const Key, T> >
>
class map { /* ... */ };
// Partially specialized template:
// Hash-based semantics.
//
template
<
class Key,
class T,
class Alloc = allocator< pair<const Key, T> >
>
class map<Key,T,hash<Key>,Alloc> { /* ... */ };
}

The idea is that, this way, all existing uses are supported — when users write any of today’s legal code that uses maps, they get the C++98 tree-based behavior:

// use C++98 tree-based
map<string, string> dictionary;

To use the hash-based internal representations, they use the version that takes (a possibly-specialized version of) a new std::hash<> template that encapsulates knowledge of the hash function to be used, the desired bucket size or load factor, and other controlling parameters:

// use fanciful hash-based
map<string, string, hash<string> > dictionary;

Note that in this model the programmer can customize his own hashing and comparison functions, by specializing or deriving from the hash template.

What do you think of Option 3? It doesn’t change the meaning of existing valid programs, and it avoids the naming problems as much as they can be avoided: Affected users will for years have two versions of the facility with different names and different semantics. But this comes at a cost, for the semantics of the normal existing set and map and so forth would be extended in a way that adds complexity and will almost certainly be less intuitive than just using a separate hash_map name would have been.

There are other choices, including just giving up and not standardizing those names, which are probably worse than any of the above. The choices sketched here show the main serious alternatives under discussion at the time of this writing. Each will undoubtedly be analyzed further, and there may be important tradeoffs not mentioned here. In the end, the committee’s choice will be primarily influenced by what is best for all users, including those affected by real-world migration issues.

Note that, regardless which of the above (or other) choices are eventually adopted to resolve this slist and hash_* problem, there is another parallel effort that library implementations that are in this situation can (and many feel should) pursue: They can begin now to migrate their hash-based containers out of namespace std, before workarounds like std::hash are even specified, much less become final), reducing the problem to the same as the copy_if example and removing the barrier to adopting the obvious default Option 1. There are many who strongly believe that this is the highest quality option, and it’s worth noting that some library implementations have already taken this path — they originally put slist and the hash_* containers into std some years ago, noticed the problem and felt that they’d made a mistake, and have spent years migrating their users away through a three-step process: in the next available release they moved the containers to their own vendor-specific vendor:: namespace and documented that it should be used from there, but provided compatibility headers so that their customers’ existing code that relied on those containers being in std:: would still work; in the subsequent release they emitted warnings for use of those facilities via namespace std; and in the third release it was gone from std:: and available only in vendor::.

There are other cases of adding new facilities that I haven’t touched on here. For example:

	What about adding an overload of an existing nonmember function that is not currently overloaded? That would break currently-working code that tries to take that function’s address without explicitly stating the function’s type via a cast or assignment to a named object, for example.
	What about providing new concepts, such as move-construction and move-assignment semantics (very different from copy construction and copy assignment), such as auto_ptr already provides?
	What about extending the interface of existing classes, such as giving allocator a realloc-like facility? Should the C++0x containers be extended to use this feature? If so, does that mean C++0x containers shouldn’t work with today’s user-written allocators which of course don’t know about any realloc-like extension, or should the implementers of the C++0x containers be required to be smart enough to use the new semantics only when they are available?

Some of these cases can be categorized, but others will probably have to be considered on a case-by-case basis.

Removing Existing Stuff

The next-simplest case is: Removing something already in the C++98 standard library because it’s broken, dangerous, or embarrassing.

I won’t name any of the usual suspects here simply because some of them are contentious, even ones like vector<bool> that are probably not widely relied upon in the field and are pretty much unanimously viewed as broken in that at best it should have been its own container rather than an incompatible specialization of vector… oops, I guess I did name one; so sue me. But since we’re talking about vector<bool>, let’s use it as an example: People could have already written code that relies on its vector-incompatible behavior, such as the extra flip function. I suppose the committee could just decide that probably few enough people rely on that behavior that it would be safe to remove it (or, put another way, the committee could decide that they didn’t give a flip), but that’s still a potentially contentious issue even for this well-known example.

The issue with the items on this short list of wouldn’t-it-be-nice-if-we-could-jettison-them facilities, to a greater or lesser degree, is that they’re already there and people are using them. Even if the standard were to officially remove them today, they would linger in implementations for years because backward compatibility can’t be ignored; vendors can’t just abandon their existing users who may still be using those facilities.

So, like it or not, most of the existing facilities are probably here to stay and the focus will be on adding new facilities without repeating the mistakes of the past.

Mechanics

Besides looking at the merits of particular facilities and proposals, there are some more basic questions that need to be answered about the C++0x facilities:

	What header(s) are they going to go in?
	What namespace(s) are they going to go in?
	What migration impact will these choices have on users?

More on these issues when we return, next time.

Acknowledgments

Thanks to the members of the C++ standards committee Library Working Group, and particularly to Matt Austern, Pete Becker, Howard Hinnant, and P.J. Plauger for their observations and for their comments on drafts of this article.

Notes

[1] Scott Meyers. Effective STL (Addison-Wesley, 2001).

Toward a Standard C++0x Library, Part 1

Adding New Stuff

Removing Existing Stuff

Mechanics

Acknowledgments

Notes

Copyright © 2009 Herb Sutter