GotW #85

Home Blog Talks Books & Articles Training & Consulting

On the
blog
RSS feed November 4: Other Concurrency Sessions at PDC
November 3
: PDC'09: Tutorial & Panel
October 26: Hoare on Testing
October 23
: Deprecating export Considered for ISO C++0x

This is the original GotW problem and solution substantially as posted to Usenet. See the book Exceptional C++ Style (Addison-Wesley, 2004) for the most current solution to this GotW issue. The solutions in the book have been revised and expanded since their initial appearance in GotW. The book versions also incorporate corrections, new material, and conformance to the final ANSI/ISO C++ standard (1998) and its Technical Corrigendum (2003).

Style Case Study #3: Construction Unions
Difficulty: 4 / 10

No, this issue isn't about organizing carpenters and bricklayers. Rather, it's about deciding between what's cool and what's uncool, good motivations gone astray, and the consequences of subversive activities carried on under the covers. It's about getting around the C++ rule of using constructed objects as members of unions.

Problem

JG Questions

1. What are unions, and what purpose do they serve?

2. What kinds of types cannot be used as members of unions? Why do these limitations exist? Explain.

 

Guru Questions

3. The article in [1] cites the motivating case of writing a scripting language: Say that you want your language to support a single type for variables that at various times can hold an integer, a string, or a list. Creating a union { int i; list<int> l; string s; } doesn't work for the reasons given above. The following code presents a workaround that attempts to support allowing any type to participate in a union. For a more detailed explanation, see the original article.

Critique this code and identify:

a) Mechanical errors, such as invalid syntax or nonportable conventions.

b) Stylistic improvements that would improve code clarity, reusability, and maintainability.

#include <list>
#include <string>
#include <iostream>
using namespace std;

#define max(a,b) (a)>(b)?(a):(b)

typedef list<int> LIST;
typedef string STRING;

struct MYUNION {
  MYUNION() : currtype( NONE ) {}
  ~MYUNION() {cleanup();}

  enum uniontype {NONE,_INT,_LIST,_STRING};
  uniontype currtype;

  inline int& getint();
  inline LIST& getlist();
  inline STRING& getstring();

protected:
  union {
    int i;
    unsigned char buff[max(sizeof(LIST),sizeof(STRING))];
  } U;

  void cleanup();
};

inline int& MYUNION::getint()
{
  if( currtype==_INT ) {
    return U.i;
  } else {
    cleanup();
    currtype=_INT;
    return U.i;
  } // else
}

inline LIST& MYUNION::getlist()
{
  if( currtype==_LIST ) {
    return *(reinterpret_cast<LIST*>(U.buff));
  } else {
    cleanup();
    LIST* ptype = new(U.buff) LIST();
    currtype=_LIST;
    return *ptype;
  } // else
}

inline STRING& MYUNION::getstring()
{
  if( currtype==_STRING) {
    return *(reinterpret_cast<STRING*>(U.buff));
  } else {
    cleanup();
    STRING* ptype = new(U.buff) STRING();
    currtype=_STRING;
    return *ptype;
  } // else
}

void MYUNION::cleanup()
{
  switch( currtype ) {
    case _LIST: {
      LIST& ptype = getlist();
      ptype.~LIST();
      break;
    } // case
    case _STRING: {
      STRING& ptype = getstring();
      ptype.~STRING();
      break;
    } // case
    default: break;
  } // switch
  currtype=NONE;
}

(For an idea of the kinds of things I'm looking for, see also Style Case Study #1 and Style Case Study #2.)

4. Show a better way to achieve a generalized variant type, and comment on any tradeoffs you encounter.

 

Solution

Unions Redux

1. What are unions, and what purpose do they serve?

Unions allow more than one object, of either class or builtin type, to occupy the same space in memory. For example:

// Example 1
//
union U
{
  int i;
  float f;
};

U u;

u.i = 42;    // ok, now i is active
std::cout << u.i << std::endl;

u.f = 3.14f; // ok, now f is active
std::cout << 2 * u.f << std::endl;

But only one of the types can be "active" at a time -- after all, the storage can after all only hold one value at a time. Also, unions only support some kinds of types, which leads us into the next question:

 

2. What kinds of types cannot be used as members of unions? Why do these limitations exist? Explain.

From the C++ standard:

An object of a class with a non-trivial constructor, a non-trivial copy constructor, a non-trivial destructor, or a non-trivial copy assignment operator cannot be a member of a union, nor can an array of such objects.

In brief, for a class type to be usable in a union, it must meet all of the following criteria:

bullet

The only constructors, destructors, and copy assignment operators are the compiler-generated ones.

bullet

There are no virtual functions or virtual base classes.

bullet

Ditto for all of its base classes and nonstatic members (or arrays thereof).

That's all, but that sure eliminates a lot of types.

Unions were inherited from C. The C language has a strong tradition of efficiency and support for low-level close-to-the-metal programming, which has been compatibly preserved in C++; that's why C++ also has unions. On the other hand, the C language does not have any tradition of language support for an object model supporting class types with constructors and destructors and user-defined copying, which C++ definitely does; that's why C++ also has to define what, if any, uses of such newfangled types make sense with the "oldfangled" unions, and do not violate the C++ object model including its object lifetime guarantees.

If C++'s restrictions on unions did not exist, Bad Things could happen. For example, consider what could happen if the following code were allowed:

// Example 2: Not Standard C++ code, but what if it were allowed?
//
void f()
{
  union IllegalImmoralAndFattening
  {
    std::string s;
    std::auto_ptr<int> p;
  };

  IllegalImmoralAndFattening iiaf;

  iiaf.s = "Hello, world"; // has s's constructor run?
  iiaf.p = new int(4); // has p's constructor run?
}
// will s get destroyed? should it be?
// will p get destroyed? should it be?

As the comments indicate, serious problems would exist if this were allowed. To avoid further complicating the language by trying to craft rules that at best only might partly patch up a few of the problems, the problematic operations were simply banished.

But don't think that unions are only a holdover from earlier times. Unions are perhaps most useful for saving space by allowing data to overlap, and this is still desirable in C++ and in today's modern world. For example, some of the most advanced C++ standard library implementations in the world now use just this technique for implementing the "small string optimization," a great optimization alternative that reuses the storage inside a string object itself: for large strings, space inside the string object stores the usual pointer to the dynamically allocated buffer and housekeeping information like the size of the buffer; for small strings, the same space is instead reused to store the string contents directly and completely avoid any dynamic memory allocation. For more about the small string optimization (and other string optimizations and pessimizations in considerable depth), see Items 13-16 in my book More Exceptional C++ [2], or Scott Meyers' discussion of current commercial std::string implementations in Effective STL [3].

 

Toward Dissection and Correction

3. The article in [1] cites the motivating case of writing a scripting language: Say that you want your language to support a single type for variables that at various times can hold an integer, a string, or a list. Creating a union { int i; list<int> l; string s; } doesn't work for the reasons given above. The following code presents a workaround that attempts to support allowing any type to participate in a union. For a more detailed explanation, see the original article.

On the plus side, the cited article addresses a real problem, and clearly much effort has been put into coming up with a good solution. Unfortunately, from well-intentioned beginnings more than one programmer has gone badly astray.

The problems with the design and the code fall into three major categories: legality, safety, and morality.

 

Critique this code and identify:

a) Mechanical errors, such as invalid syntax or nonportable conventions.

b) Stylistic improvements that would improve code clarity, reusability, and maintainability.

The first overall comment that needs to be made is that the fundamental idea behind this code is not legal in Standard C++. The original article summarizes the key idea:

"The idea is that instead of declaring object members, you instead declare a raw buffer [non-dynamically, as a char array member inside the object pretending to act like a union] and instantiate the needed objects on the fly [by in-place construction]." [1]

The idea is common, but unfortunately isn't sound. This technique is nonconforming and nonportable because buffers that are not dynamically allocated (e.g., via malloc() or new()) are not guaranteed to be correctly aligned for any other type. Even if this technique happens to accidentally work for some types on someone's current compiler, there's no guarantee it will continue to work for other types, or for the same types in the next version of the same compiler. For more details and some directly related discussion, see for example Item 30 in Exceptional C++, notably the sidebar titled "Reckless Fixes and Optimizations, and Why They're Evil." [4] See also the alignment discussion in [9].

For C++0x, the standards committee is considering adding alignment aids to the language specifically to enable techniques that rely on alignment like this, but that's all still in the future. For now, to make this work reasonably reliably even some of time, you'd have to do one of the following:

bullet

Rely on the max_align hack (see the above citation which footnotes the max_align hack, or do a Google search for max_align); or

bullet

Rely on nonstandard extensions like Gnu's __alignof__ to make this work reliably on a particular compiler that supports such an extension. (Even though Gnu provides an ALIGNOF macro intended to work more reliably on other compilers, it too is admitted "hackery" that relies on the compiler's laying out objects in certain ways and making guesses based on offsetof() inquiries, which may often be a good guess but is not guaranteed by the standard. See for example [5].)

You could work around this by dynamically allocating the array using malloc() or new(), which would guarantee that the char buffer is suitably aligned for object of any type, but that would still be a bad idea (it's still not type-safe) and it wouldn't achieve the potential efficiency gains that the original article was aiming for. An alternative and correct solution would be to use boost::any (see below) which incurs a similar allocation/indirection overhead and is also both safe and correct; more about that later on.

Attempts to work against the language, or to make the language work the way we want it to work instead of the way it actually does work, are often questionable and should be a big red flag. In the Exceptional C++ sidebar cited above, while in an ornery mood I also accused a similar technique of "just plain wrongheadedness" followed by some pretty strong language. There can still be cases where it could be reasonable to use constructs that are known to be nonportable but okay in a particular environment (in this case, perhaps using the max_align hack), but even then I would argue that that fact should be noted explicitly and further that it still has no place in a general piece of code recommended for wide use.

 

#include <list>
#include <string>
#include <iostream>
using namespace std;

Since new is going to be used below, also #include <new>. (The <iostream> header was used later in the original code, not shown here, which had a test harness that emitted output.)

#define max(a,b) (a)>(b)?(a):(b)

typedef list<int> LIST;
typedef string STRING;

struct MYUNION {
  MYUNION() : currtype( NONE ) {}
  ~MYUNION() {cleanup();}

The first classic mechanical error above is that MYUNION is unsafe to copy because the programmer forgot to provide a suitable copy constructor and copy assignment operator.

MYUNION is choosing to play games that require special work be done in the constructor and destructor, so these are provided as above; that's fine as far as it goes. But it doesn't go far enough, because the same games require special work in the copy constructor and copy assignment operator, which are not provided. The default compiler-generated copying operations do the wrong thing, namely copy the contents bitwise as an array of chars, which is likely to have most unsatisfactory results, in most cases leading straight to memory corruption. Consider the following code:

// Example 3-1: MYUNION is unsafe for copying
//
{
  MYUNION u1, u2;
  u1.getstring() = "Hello, world";
  u2 = u1; // copies the bits of u1 to u2
} // oops, double delete of the string (assuming the bitwise copy even made sense)

Guideline: Observe the Law of the Big Three: If a class needs a custom copy constructor, copy assignment operator, or destructor, it probably needs all three.

Passing on from the classic mechanical error, we next encounter a duo of classic stylistic errors:

  enum uniontype {NONE,_INT,_LIST,_STRING};
  uniontype currtype;

  inline int& getint();
  inline LIST& getlist();
  inline STRING& getstring();

There are two stylistic errors here. First, this struct is not reusable because it is hardcoded for specific types. Indeed, the original article recommended handcoding such a struct every time it was needed. Second, even given its limited intended usefulness, it is not very extensible or maintainable. We'll return to this frailty again later, once we've covered more of the context.

There are also two mechanical problems. The first is that currtype is public for no good reason; this violates good encapsulation and means any user can freely mess with the type, even by accident. The second mechanical problem concerns the names used in the union; I'll cover that in its own section, "Underhanded Names," later on.

protected:

Next, we encounter another mechanical error: The internals ought to be private, not protected. The only reason to use protected would be to make the internals available to derived classes, but there had better not be any derived classes because MYUNION is unsafe to derive from for several reasons -- not least because of the murky and abstruse games it plays with its internals, and because it lacks a virtual destructor.

 

  union {
    int i;
    unsigned char buff[max(sizeof(LIST),sizeof(STRING))];
  } U;

  void cleanup();
};

That's it for the main class definition. Moving on, consider the three parallel accessor functions:

inline int& MYUNION::getint()
{
  if( currtype==_INT ) {
    return U.i;
  } else {
    cleanup();
    currtype=_INT;
    return U.i;
  } // else
}

inline LIST& MYUNION::getlist()
{
  if( currtype==_LIST ) {
    return *(reinterpret_cast<LIST*>(U.buff));
  } else {
    cleanup();
    LIST* ptype = new(U.buff) LIST();
    currtype=_LIST;
    return *ptype;
  } // else
}

inline STRING& MYUNION::getstring()
{
  if( currtype==_STRING) {
    return *(reinterpret_cast<STRING*>(U.buff));
  } else {
    cleanup();
    STRING* ptype = new(U.buff) STRING();
    currtype=_STRING;
    return *ptype;
  } // else
}

A minor nit: The "// else" comment adds nothing. It's unfortunate that the only comments in the code are useless ones.

More seriously, there are three major problems here. The first is that the functions are not written symmetrically, and whereas the first use of a list or a string yields a default-constructed object, the first use of int yields an uninitialized object. If that is intended, in order to mirror the ordinary semantics of uninitialized int variables, that should be documented; since it is not, the int ought to be initialized. For example, if the caller accesses getint() and tries to make a copy of the (uninitialized) value, the result is undefined behavior -- not all platforms support copying arbitrary invalid int values, and some will reject the instruction at runtime.

The second major problem is that this code hinders const-correct use. If the code is really going to be written the above way, then at least it would be useful to also provide const overloads for each of these functions; each would naturally return the same thing as its non-const counterpart, but by a reference to const.

The third major problem is that the approach above is fragile and brittle in the face of change. It relies on type switching (see any of Steve Dewhurst's many commentaries against this notion in other contexts in previous issues of CUJ), and it's easy to accidentally fail to keep all the functions in sync when you add or remove new types.

Stop reading here and consider: What do you have to do in the above code if you want to add a new type? Make as complete a list as you can.

* * * * *

Are you back? All right, here's the list I came up with. To add a new type, you have to remember to: (a) add a new enum value; (b) add a new accessor member; (c) update the cleanup() function to safely destroy the new type; and (d) add that type to the max() calculation to ensure buff is sufficiently large to hold the new type too.

If you missed one or more of those, well, that just illustrates how difficult this code really is to maintain and extend.

Pressing onward, we come to the final function:

void MYUNION::cleanup()
{
  switch( currtype ) {
    case _LIST: {
      LIST& ptype = getlist();
      ptype.~LIST();
      break;
    } // case
    case _STRING: {
      STRING& ptype = getstring();
      ptype.~STRING();
      break;
    } // case
    default: break;
  } // switch
  currtype=NONE;
}

Let's reprise that small commenting nit again: The "// case" and "// switch" comments add nothing; it's unfortunate that the only comments in the code are useless ones. It is better to have no comments at all than to have comments that are just distractions.

But there's a larger issue here: Rather than having simply "default: break;", it would be good to make an exhaustive list (including the "int" type) and signal a logic error if the type is unknown -- perhaps via "throw std::logic_error(...);".

Again, type switching is purely evil. A Google search for "switch C++ Dewhurst" will yield all sorts of interesting references on this topic, including [6]; see those for more details, if you need more ammo to convince colleagues to avoid the type-switching beast.

Guideline: Avoid type switching; prefer type safety.

 

Underhanded Names

There's one mechanical problem I haven't yet covered. This problem first rears its ugly, unshaven, and unshampooed head in the following line:

  enum uniontype {NONE,_INT,_LIST,_STRING};

Never, ever, ever create names that begin with an underscore or contain a double underscore; they're reserved for your compiler and standard library vendor's exclusive use, so that they have names that they can use without tromping on your code. Tromp on their names, and their names might just tromp back on you! (The more specific rule is that any name with a double underscore anywhere in it __like__this or that starts with an underscore and a capital letter _LikeThis is reserved. You can remember that rule if you like, but it's a bit easier to just avoid both leading underscores and double underscores entirely.)

Don't stop! Keep reading! You might have read this advice before. You might even have read it from me. You might even be tired of it, and yawning, and ready to ignore the rest of this section. If so, this one's for you, because this advice is not at all theoretical, and it bites and bites hard in this code.

The above line happens to compile on most of the compilers I tried (Borland 5.5, Comeau 4.3.0.1, Intel 7.0, gcc 2.95.3 / 3.1.1 / 3.2, and Microsoft Visual C++ 6.0, 7.0, and 7.1 RC1). But under two of them -- Metrowerks CodeWarrior 8.2, and the EDG 3.0.1 demo front-end used with the Dinkumware 4.0 standard library -- the code breaks horribly.

Under Metrowerks CodeWarrior 8, this line breaks noisily with the first of 52 errors. The 225 lines of error messages begin with the following diagnostics:

### mwcc Compiler:
#    File: 1.cpp
# --------------
#      17:      enum uniontype {NONE,_INT,_LIST,_STRING};
#   Error:                                     ^
#   identifier expected
### mwcc Compiler:
#      18:      uniontype currtype;
#   Error:      ^^^^^^^^^
#   declaration syntax error

followed by 52 further error messages, and 215 more lines. What's pretty obvious from the second and later errors is that we should ignore them for now because they're just cascades from the first error -- since uniontype was never successfully defined, the rest of the code which uses uniontype extensively will of course break too.

But what's up with the definition of uniontype? The indicated comma sure looks like it's in a reasonable place, doesn't it? There's an identifier happily sitting in front of it, isn't there? All becomes clear when we ask the Metrowerks compiler to spit out the preprocessed output... omitting many many lines, here's what the compiler finally sees:

enum uniontype {NONE,_INT, , };

Aha! That's not valid C++, and the compiler rightly complains about the third comma because there's no identifier in front of it.

But what happened to _LIST and _STRING? You guessed it -- tromped on and eaten by the ravenously hungry Preprocessor Beast. It just so happens that Metrowerks' implementation has macros that happily strip away the names _LIST and _STRING, which is perfectly legal and legitimate because it (the implementation) is allowed to own those _Names (as well as _Other__names).

So Metrowerks' implementation happens to eat both _LIST and _STRING. What about EDG's/Dinkumware's? Judge for yourself:

"1.cpp", line 17: error: trailing comma is nonstandard
      enum uniontype {NONE,_INT,_LIST,_STRING};
                                     ^

"1.cpp", line 58: error: expected an expression
      if( currtype==_STRING) {
                           ^

"1.cpp", line 63: error: expected an expression
          currtype=_STRING;
                          ^

"1.cpp", line 76: error: expected an expression
          case _STRING: {
                      ^

4 errors detected in the compilation of "1.cpp".

This time, even without generating and inspecting a preprocessed version of the file, we can see what's going on: The compiler is behaving as though the word "_STRING" wasn't there. That's because it was -- you guess it -- tromped on, not to mention thoroughly chewed up and spat out, by the still-peckish Preprocessor Beast.

I hope that this will convince you that when some writers natter on about not using _Names like__these, the problem is far from theoretical. It's practical indeed, because the naming restriction directly affects your relationship with your compiler and standard library writer. Trespass on their turf, and you might get lucky and remain unscathed; on the other hand, you might not.

The C++ landscape is wide-open and clear and lets you write all sorts of wonderful and flexible code and wander in pretty much whatever direction your development heart desires, including that it lets you choose pretty much whatever names you like outside of namespace std. But when it comes to names, C++ also has one big fenced-off grove, surrounded by gleaming barbed wire and signs that say things like "Employees__Only -- Must Have Valid _Badge To Enter Here" and "Violators May Be Tromped and Eaten." The above is a stellar example of the tromping one gets for disregarding the _Warnings.

Guideline: Never use "underhanded names" -- ones that begin with an underscore, or that contain a double underscore.

 

Toward a Better Way: boost::any

4. Show a better way to achieve a generalized variant type, and comment on any tradeoffs you encounter.

The original article says:

"[Y]ou might want to implement a scripting language with a single variable type that can either be an integer, a string, or a list." [1]

This is true, and there's no disagreement so far. But the article then continues:

"A union is the perfect candidate for implementing such a composite type." [1]

Rather, the article has served to show in some considerable detail just why a union is not suitable at all.

But if not a union, then what? One very good candidate for implementing such a variant type is Boost's "any" facility, along with its "many" and "any_cast".[7] Jim Hyslop and I discussed it in our article "I'd Hold Anything For You."[8] Interestingly, the complete implementation for the fully general "any" (covering any number/combination of types and even some platform-specific #ifdefs) is about the same amount of code as the sample MYUNION solution for the special case of the three types int, list<int>, and string -- and it's fully general, extensible, type-safe, and part of a healthy low-cholesterol diet.

There is still a tradeoff, however, and it is this: Dynamic allocation. The boost::any facility does not attempt to achieve the potential efficiency gain of avoiding a dynamic memory allocation, which was part of the motivation in the original article. Note too that the boost::any dynamic allocation overhead is more than if the original article's code was just modified to use (and reuse) a single dynamically allocated buffer that's acquired once for the lifetime of MYUNION, because boost::any performs a dynamic allocation every time the contained type is changed, too.

Here's how the article's demo harness would look if it instead used boost::any. The old code that uses the original article's version of MYUNION is shown in comments for comparison:

// MYUNION u;
any u;

Instead of a handwritten struct, which has to be written again for each use, just use any directly. Note that any is a plain class, not a template.

// access union as integer
// u.getint() = 12345;
u = 12345;

The assignment shows any's more natural syntax.

// cout << "int=" << u.getint() << endl;
cout << "int=" << any_cast<int>(u) << endl;
               // or just "int(u)"

I like any's cast form better because it's more general (including that it is a nonmember) and more natural to C++ style; you could also use the less verbose "int(u)" without an any_cast if you know the type already. On the other hand, get[type]() is more fragile, harder to write and maintain, and so forth.

// access union as std::list
// LIST& list = u.getlist();
// list.push_back(5);
// list.push_back(10);
// list.push_back(15);

u = list<int>();
list<int>& l = *any_cast<list<int> >(&u);
l.push_back(5);
l.push_back(10);
l.push_back(15);

I think any_cast could be improved to make it easier to get references, but this isn't too bad. (Aside: I'd discourage using 'list' as a variable name when it's also the name of a template in scope; too much room for expression ambiguity.)

So far, we've achieved some typability and readability savings. The remaining differences are more minor:

// LIST::iterator it = list.begin();
list<int>::iterator it = l.begin();
while( it != l.end() ) {
  cout << "list item=" << *(it) << endl;
  it++;
} // while

Pretty much unchanged.

// access union as std::string
// STRING& str = u.getstring();
// str = "Hello world!";
u = string("Hello world!");

Again, about a wash; I'd say the any version is slightly simpler than the original, but only slightly.

// cout << "string='" << str.c_str() << "'" << endl;
cout << "string='" << any_cast<string>(u) << "'" << endl;
                   // or just "string(u)"

As before.

 

Alexandrescu's Discriminated Unions

Is it possible to fully achieve both of the original goals -- safety and avoiding dynamic memory -- in a conforming Standard C++ implementation? That sounds like a problem that someone like Andrei Alexandrescu would love to sink his teeth into, especially if it could somehow involve complicated templates. As evidenced in [9], [10], and [11], where Andrei describes his discriminated unions (a.k.a. Variant) approach, it turns out that:

bullet

it is (something he would love to tackle), and

bullet

it can (involve weird templates, and just one quote from [9] says it all: "Did you know that unions can be templates?"), so

bullet

he does.

In short, by performing heroic efforts to push the boundaries of the language as far as possible, Alexandrescu's Variant comes very close to being a truly portable solution. It falls only slightly short, and is probably portable enough in practice even though it goes beyond the pale of what the Standard guarantees. Its main problem is that, even ignoring alignment-related issues, the Variant code is so complex and advanced that it actually works on very few compilers -- in my testing, I only managed to get it to work with one.

A key part of Alexandrescu's Variant approach is an attempt to generalize the max_align idea to make it a reusable library facility that can itself still be written in conforming Standard C++. The reason for wanting this is specifically to deal with the alignment problems in the code we've been analyzing above, so that a non-dynamic char buffer can continue to be used in relative safety. Alexandrescu makes heroic efforts to use template metaprogramming to calculate a safe alignment. Will it work portably? His discussion of this question follows:

"Even with the best Align, the implementation above is still not 100-percent portable for all types. In theory, someone could implement a compiler that respects the Standard but still does not work properly with discriminated unions. This is because the Standard does not guarantee that all user-defined types ultimately have the alignment of some POD type. Such a compiler, however, would be more of a figment of a wicked language lawyer's imagination, rather than a realistic language implementation.

"[...] Computing alignment portably is hard, but feasible. It never is 100-percent portable." [10]

There are other key features in Alexandrescu's approach, notably a union template that takes a typelist template of the types to be contained, visitation support for extensibility, and an implementation technique that will "fake a vtable" for efficiency to avoid an extra indirection when accessing a contained type. These parts are more heavyweight than boost::any, but are portable in theory. That "portable in theory" part is important -- as with Andrei's great work in Modern C++ Design [12] [13], the implementation is so heavy on templates that the code itself contains comments like: "Guaranteed to issue an internal compiler error on: [various popular compilers, Metrowerks, Microsoft, Gnu gcc]", and the mainline test harness contains a commented-out test helpfully labeled "The construct below didn't work on any compiler."

That is Variant's major weakness: Most real-world compilers don't even come close to being able to handle this implementation, and the code should be viewed as important but still experimental. I attempted to build Alexandrescu's Variant code using all of the compilers that I have available: Borland 5.5; Comeau 4.3.0.1; EDG 3.0.1; Intel 7.0; gcc 2.95, 3.1.1, and 3.2; Metrowerks 8.2; and Microsoft VC++ 6.0, 7.0, and 7.1 RC1. As some readers will know, some of the products in that list are very strong and standards-conforming compilers. None of these compilers could successfully compile Alexandrescu's template-heavy source as it was provided.

I tried to massage the code by hand to get it through any of the compilers, but was only successful with Microsoft VC++ 7.1 RC1. Most of the compilers didn't stand a chance, because they did not have nearly strong enough template support to deal with Alexandrescu's code. (Some emitted a truly prodigious quantity of warnings and errors -- Intel 7.0's response to compiling main.cpp was to spew back an impressive 430K's worth -- really, nearly half a megabyte! -- of diagnostic messages.)

I had to make three changes to get the code to compile without errors (although still with some narrowing-conversion warnings at the highest warning level) under Microsoft VC++ 7.1 RC1:

bullet

Added a missing "typename" in class AlignedPOD.

bullet

Added a missing "this->" to make a name dependent in ConverterTo<>::Unit<>::DoVisit().

bullet

Added a final newline character at the end of several headers, as required by the C++ standard (some conforming compilers aren't strict about this and allow the absence of a final newline as a conforming extension; VC++ is stricter and requires the newline). [14]

As the author of [1] commented further about tradeoffs in Alexandrescu's design: "It doesn't use dynamic memory, and it avoids alignment issues and type switching. Unfortunately I don't have access to a compiler that can compile the code, so I can't evaluate its performance vs. myunion and any. Alexandrescu's approach requires 9 supporting header files totaling ~80KB, which introduces its own set of maintenance problems." [15]

I won't try to summarize Andrei's three articles further here, but I encourage readers who are interested in this problem to look them up. They're available online as indicated in the references below.

Guideline: If you want to represent variant types, for now prefer to use boost::any (or something equally simple).

Once the compiler you are using catches up (in template support) and the Standard catches up (in true alignment support) and Variant libraries catch up (in mature implementations), it will be time to consider using Variant-like library tools as type-safe replacements for unions.

 

Summary

Even if the design and implementation of MYUNION are lacking, the motivating problem is both real and worth considering. I'd like to thank Mr. Manley for taking the time to write this article and raise awareness of the need for variant type support, and Kevlin Henney and Andrei Alexandrescu for contributing their own solutions to this area. It is a hard enough problem that Manley's and Alexandrescu's approaches are not strictly portable, standards-conforming C++, although Alexandrescu's Variant makes heroic efforts to get there -- Alexandrescu's design is very close to portable in theory, although the implementation is still far from portable in practice because very few compilers can handle the advanced template code it uses.

For now, an approach like Henney's boost::any is the preferred way to go. If in certain places your measurements tell you that you really need the efficiency or extra features provided by something like Alexandrescu's Variant, and you have time on your hands and some template know-how, you might experiment with writing your own scaled-back version of the full-blown Variant by applying only the ideas in [9], [10], and [11] that are applicable to your situation.

 

References

[1] K. Manley. "Using Constructed Types in Unions" (C/C++ Users Journal, 20(8), August 2002).

[2] H. Sutter. More Exceptional C++ (Addison-Wesley, 2002).

[3] S. Meyers. Effective STL (Addison-Wesley, 2001).

[4] H. Sutter. Exceptional C++ (Addison-Wesley, 2000).

[5] http://list-archive.xemacs.org/xemacs-patches/200101/msg00183.html

[6] S. Dewhurst. "C++ Hierarchy Design Idioms", available online at www.semantics.org/talknotes/SD2002W_HIERARCHY.pdf.

[7] K. Henney. C++ Boost any class, www.boost.org/libs/any.

[8] H. Sutter and J. Hyslop. "I'd Hold Anything For You" (C/C++ Users Journal, 19(12), December 2001), available online at http://www.cuj.com/experts/1912/hyslop.htm.

[9] A. Alexandrescu. "Discriminated Unions (I)" (C/C++ Users Journal, 20(4), April 2002).

[10] A. Alexandrescu. "Discriminated Unions (II)" (C/C++ Users Journal, 20(6), June 2002).

[11] A. Alexandrescu. "Discriminated Unions (III)" (C/C++ Users Journal, 20(8), August 2002).

[12] A. Alexandrescu. Modern C++ Design (Addison-Wesley, 2001).

[13] H. Sutter. "Review of Alexandrescu's Modern C++ Design" (C/C++ Users Journal, 20(4), April 2002), available online at http://www.gotw.ca/publications/mcd_review.htm.

[14] Thanks to colleague Jeff Peil for pointing out this requirement in clause 2.1/1, which states: "If a source file that is not empty does not end in a new-line character, or ends in a new-line character immediately preceded by a backslash character, the behavior is undefined."

[15] K. Manley, private communication.

Copyright © 2009 Herb Sutter