More About Compiler Firewalls...

Home Blog Talks Books & Articles Training & Consulting

Prev
Up
Next

On the
blog
RSS feed November 4: Other Concurrency Sessions at PDC
November 3
: PDC'09: Tutorial & Panel
October 26: Hoare on Testing
October 23
: Deprecating export Considered for ISO C++0x

This is the original article substantially as first published. See the book Exceptional C++ (Addison-Wesley, 2000) for the most current version of this article. The versions in the book have been revised and expanded since their initial appearance in print. The book versions also incorporate corrections, new material, and conformance to the final ANSI/ISO C++ standard.

The Joy of Pimpls (or, More About the Compiler-Firewall Idiom)

This article appeared in C++ Report, 10(7), July/August 1998.

 

In my previous column,[1] I briefly outlined the Pimpl Idiom (or, Compiler Firewall Idiom) and showed how it and other techniques can be used to minimize compile-time dependencies. There are many interesting aspects of pimpls that I didn't cover last time; to make up for that, I'll devote this month's column to these engaging little classes.

Pimpls: A Recap

In short, the core issue is that when anything in a C++ class definition changes--even private members--all users of that class definition must be recompiled.  To reduce these dependencies, a common technique is to use an opaque pointer to an implementation class, the eponymous "pimpl," to hide some of the internal details:

     class X {
     public:
       /* ... public members ... */
     protected:
       /* ... protected members? ... */
     private:
       /* ... private members? ... */
       class XImpl* pimpl_;
         /* opaque pointer to forward-declared class */
     };

This is a variant of the handle/body idiom. As documented by Coplien, handle/body was described as being primarily useful for reference counting of a shared implementation, but it also has more general implementation-hiding uses.[2] For convenience, from now on I'll call X the "visible class" and XImpl the "pimpl class."

One big advantage of this idiom is that it breaks compile-time dependencies. First, system builds run faster because using a pimpl can eliminate extra #includes as demonstrated in the previous article. I have worked on projects where converting just a few widely-visible classes to use pimpls has halved the system's build time. Second, it localizes the build impact of code changes because the parts of a class that reside in the pimpl can be freely changed--that is, members can be freely added or removed--without recompiling client code.

Here are the questions I'll cover this time:

1.       What should go into XImpl?

2.       Does XImpl require a "back pointer" to the X object?

3.       How can we overcome the space overhead (for storing one or two pointers, and possibly even more "wasted" space)?

4.       How can we overcome the performance overhead (of the extra allocations and indirections)?

"What's In a Pimpl?"[3]

First, what should go into XImpl? There are four main alternative disciplines:

o         Put all private data (but not functions) into XImpl.

o         Put all private members into XImpl.

o         Put all private and protected members into XImpl.

o         Make XImpl entirely the class that X would have been, and write X as only the public interface made up entirely of simple forwarding functions (another handle/body variant).

Before reading on, consider: What are the advantages/drawbacks?  How would you choose among them?

Option 1 (Score: 6 / 10): Put all private data (but not functions) into XImpl. This is a good start, because now we can forward-declare any class which only appears as a data member (rather than #include the class' actual declaration, which would make client code depend on that too).  Still, we can usually do better.

Option 2 (Score: 10 / 10): Put all private members into XImpl. This is (almost) my usual practice these days. After all, in C++, the phrase "client code shouldn't and doesn't care about these parts" is spelled "private," and privates are always hidden.[4]

There are three caveats, the first of which is the reason for my "almost" above:

1.       You can't hide virtual member functions in the pimpl, even if the virtual functions are private. If the virtual function overrides one inherited from a base class, then it must appear in the actual derived class. If the virtual function is not inherited, then it must still appear in the visible class in order to be available for overriding by further derived classes.[5]

2.       Functions in the pimpl may require a "back pointer" to the visible object if they need to in turn use visible functions, which adds another level of indirection. (By convention such a back pointer is usually named self_ where I've worked.)

3.       Often the best compromise is to use Option 2, and additionally put into XImpl only those non-private functions that need to be called by the private ones (see the "back pointer" comments below).

Option 3 (Score: 0 / 10): Put all private and protected members into XImpl. Taking this extra step to include protected members is actually wrong. Protected members should never go into a pimpl, since putting them there just emasculates them. After all, protected members exist specifically to be seen and used by derived classes, and so aren't nearly as useful if derived classes can't see or use them.

Option 4 (Score: 10 / 10 in restricted cases): Make XImpl entirely the class that X would have been, and write X as only the public interface made up entirely of simple forwarding functions (another handle/body variant). This is useful in a few restricted cases, and has the benefit of avoiding a back pointer since all services are available within the pimpl class.  The chief drawback is that it normally makes the visible class useless for any inheritance, as either a base or a derived class.

Does XImpl Require a Back Pointer?

Does the pimpl require a back pointer to the visible object? The answer is: Sometimes, unhappily, yes. After all, what we're doing is (somewhat artificially) splitting each object into two halves for the purposes of hiding one part.

Consider: Whenever a function in the visible class is called, usually some function or data in the hidden half is needed to complete the request. That's fine and reasonable. What's perhaps not as obvious at first is that often a function in the pimpl must call a function in the visible class, usually because the called function is public or virtual. One way to minimize this is to use Option 4 (above) judiciously for the functions concerned... that is, implement Option 2 and additionally put inside the pimpl any non-private functions that are used by private functions.

What About the Space Overhead?

"What space overhead?" you ask? Well, we now need space for at least one extra pointer (and possibly two, if there's a back pointer in XImpl) for every X object. This typically adds at least four (or eight) bytes on many popular systems, and possibly as many as 14 bytes or more depending on alignment requirements! For example, try the following program on your favourite compiler:

     struct X1 { char c; };
     struct X2 { char c; X1* p; };

     int main() {
       cout << sizeof(X1) << ' ' << sizeof(X2) << endl;
     }

On many popular compilers that use 32-bit pointers, this prints:

     1 8

On these compilers, the overhead of storing one extra pointer was actually seven bytes, not four. Why? Because the platform on which the compiler is running either requires a pointer to be stored on a four-byte boundary, or else performs much more poorly if the pointer isn't stored on such a boundary. Knowing this, the compiler allocates three bytes of unused/empty space inside each X2 object, which means the cost of adding a pointer member was actually seven bytes, not four. If a back pointer is also needed, then the total storage overhead can be as high as 14 bytes on a 32-bit machine, as high as 30 bytes on a 64-bit machine, and so on.

How do we get around this space overhead? The short answer is: We can't eliminate it, but sometimes we can minimize it.

The longer answer is: There's a downright reckless way to eliminate it that you should never ever use (and don't tell anyone that you heard it from me), and there's usually a nonportable but correct way to minimize it. The utterly reckless "space optimization" happens to be the same as the utterly reckless "performance optimization," so I've moved that discussion off to the side; see the accompanying box.

If (and only if) the space difference is actually important in your program, then the nonportable but correct way to minimize the pointer overhead is to use compiler-specific #pragmas. Many compilers will let you override the default alignment/packing for a given class; see your vendor's documentation for details. If your platform only "prefers" (rather than "enforces") pointer alignment and your compiler offers this feature, then on a 32-bit platform you can eliminate as much as six bytes of overhead per X object, at the (usually minuscule) cost of runtime performance because actually using the pointer will be slightly less efficient. Before you even consider anything like this, though, always follow the age-old sage advice: First make it right, then make it fast. Never optimize--neither for speed, nor for size--until your profiler and other tools tell you that you should.

What About the Performance Overhead?

Using the Pimpl idiom can have a performance overhead for two main reasons: For one thing, each X construction/destruction must now allocate/deallocate memory for its XImpl object, which is typically a relatively expensive operation.[6] For another, each access of a member in the pimpl can require at least one extra indirection.[7]

How do we get around this performance overhead? The short answer is: Use the Fast Pimpl idiom, which I'll cover next. (There's also a downright reckless way to eliminate it that you should never ever use; see the accompanying box.)

The Fast Pimpl Idiom

The main performance issue here is that space for the pimpl objects is being allocated from the free store. In general, the right way to address allocation performance for a specific class is to overload operator new for that class and use a fixed-size allocator, because fixed-size allocators can be made much more efficient than general-purpose allocators.

     // file x.h
     class X {
       /*...*/
       class XImpl* pimpl_;
     };

     // file x.cpp
     #include "x.h"
     struct XImpl {
       /*...private stuff here...*/
       void* operator new( size_t )   { /*...*/ }
       void  operator delete( void* ) { /*...*/ }
     };

     X::X() : pimpl_( new XImpl ) {}

     X::~X() { delete pimpl_; pimpl_ = 0; }

"Aha!" you say. "We've found the holy grail--the Fast Pimpl!" you say.  Well, yes, but hold on a minute and think about how this will work and what it will cost you.

Your favourite advanced C++ or general-purpose programming textbook has the details about how to write efficient fixed-size [de]allocation functions, so I won't cover that again here. I will talk about usability: One technique is to put the [de]allocation functions in a generic fixed-size allocator template, perhaps something like this:

     template<size_t S>
     class FixedAllocator {
     public:
       void* Allocate( /*requested size is always S*/ );
       void  Deallocate( void* );
     private:
       /*...implemented using statics?...*/
     };

Because the private details are likely to use statics, however, there could be problems if Deallocate is ever called from a static object's destructor. Probably safer is a singleton that manages a separate free list for each request size (or, as an efficiency tradeoff, a separate free list for each request size "bucket"; e.g., one list for blocks of size 0-8, another for blocks of size 9-16, etc.):

     class FixedAllocator {
     public:
       static FixedAllocator* Instance();
       void* Allocate( size_t );
       void  Deallocate( void* );
     private:
       /*...singleton implementation, typically
            with easier-to-manage statics than
            the templated alternative above...*/
     };

Let's throw in a helper base class to encapsulate the calls. This works because derived classes "inherit" these overloaded base operators:

     struct FastAllocation {
       void* operator new( size_t s ) {
         return FixedAllocator::Instance()->Allocate(s);
       }
       void operator delete( void* p ) {
         FixedAllocator::Instance()->Deallocate(p);
       }
     };

Now, you can easily write as many Fast Pimpls as you like:

     //  Want this one to be a Fast Pimpl?
     //  Easy, then just inherit...
     struct XImpl : FastAllocation {
       /*...private stuff here...*/
     };

But Beware!

This is nice and all, but don't just use the Fast Pimpl willy-nilly. You're getting extra allocation speed, but as usual you should never forget the cost: Managing separate free lists for objects of specific sizes usually means incurring a space efficiency penalty because any free space is fragmented (more than usual) across several lists.

A final reminder: As with any other optimization, use pimpls in general and fast pimpls in particular only after profiling and experience prove that the extra performance boost is really needed in your situation.

In the next column, I'll cover the uses and abuses of non-public inheritance. Stay tuned.

Reckless Fixes and Optimizations, and Why They

The main article shows why using the Pimpl idiom can incur space and performance overheads, and it also shows the right way to minimize or eliminate those overheads. There is also a common, but wrong, way to deal with them.

Here's the reckless, unsafe, might-work-if-you're-lucky, evil, fattening, and high-cholesterol way to eliminate the space and performance overheads, and you didn't hear it from me... the only reason I'm mentioning it at all is because I've seen people try to do this:

     // evil dastardly header file x.h
     class X {
       /* . . . */
       static const size_t sizeofximpl = /*some value*/;
       char pimpl_[sizeofximpl];
     };

     // pernicious depraved implementation file x.cpp
     #include "x.h"
     X::X() {
       assert( sizeofximpl >= sizeof(XImpl) );
       new (&pimpl_[0]) XImpl;
     }
     X::~X() {
       (reinterpret_cast<XImpl*>(&pimpl_[0]))->~XImpl();  
     }

DON Yes, it removes the space overhead--it doesn't use so much as a single pointer. [8] Yes, it removes the memory allocation overhead--there's nary a malloc or new in sight. Yes, it might even happen to work on the current version of your current compiler.

It's also completely nonportable. Worse, it will completely break your system even if it does appear to work at first. Here are several reasons:

1.       Alignment. Any memory that's allocated dynamically via new or malloc is guaranteed to be properly aligned for objects of any type, but buffers that are not allocated dynamically have no such guarantee:

     char* buf1 = (char*)malloc( sizeof(Y) );
     char* buf2 = new char[ sizeof(Y) ];
     char  buf3[ sizeof(Y) ];

     new (buf1) Y;     // OK, buf1 allocated dynamically (#1)
     new (buf2) Y;     // OK, buf2 allocated dynamically (#2)
     new (&buf3[0]) Y; // error, buf3 may not be suitably aligned

     (reinterpret_cast<Y*>(buf1))->~Y(); // OK
     (reinterpret_cast<Y*>(buf2))->~Y(); // OK
     (reinterpret_cast<Y*>(&buf3[0]))->~Y(); // error

Just to be clear: I'm not recommending that you do #1 or #2. I'm just pointing out that they're legal, whereas the above attempt to have a pimpl without dynamic allocation is not, even though it may (dangerously) appear to work correctly at first if you happen to get lucky.[9]

2.       Brittleness. The author of X has to be inordinately careful with otherwise-ordinary X functions. For example, X must not use the default assignment operator, but must either suppress assignment or supply its own. (Writing a safe X::operator= isn't too hard, but I'll leave it as an exercise for the reader. Remember to account for exception safety in that and in X::~X.[10] Once you're finished, I think you'll agree that this is a lot more trouble than it's worth.)

3.       Maintenance Cost. When sizeof(XImpl) grows beyond sizeofximpl, the programmer must bump up sizeofximpl. This can be an unattractive maintenance burden. Choosing a larger value for sizeofximpl mitigates this, but at the expense of trading off efficiency (see #4).

4.       Inefficiency. Whenever sizeofximpl > sizeof(XImpl), space is being wasted. This can be minimized, but at the expense of maintenance effort (see #3).

5.       Just Plain Wrongheadedness. In short, it's obvious that the programmer is trying to do "something unusual." Frankly, in my experience, "unusual" is just about always a synonym for "hack." Whenever you see this kind of subversion--whether it's allocating objects inside character arrays like this programmer is doing, or implementing assignment using explicit destruction and placement new as discussed in Guru of the Week #23--you should Just Say No.[11]

Bottom line, C++ doesn't support opaque types directly, and this is a brittle attempt to work around that limitation.

 

Notes

1. H. Sutter, "Pimpls: Beauty Marks You Can Depend On" (C++ Report, May 1998).

2. J. Coplien. Advanced C++ Programming Styles and Idioms (Addison-Wesley, 1992).

3. Please don't email me jokes about this subheading. I can imagine most of the answers.

4. Except in some liberal European countries.

5. Making a virtual private is usually not a good idea, anyway. The point of a virtual function is to allow a derived class to redefine it, and a common redefinition technique is to call the base class' version (not possible, if it's private) for most of the functionality.

6. Compared to most other common operations in C++, such as function calls. Note that here I'm specifically talking about the cost of using a general-purpose allocator, which is what you typically get with the built-in operator new and malloc.

7. If the hidden member being accessed itself uses a back pointer to call a function in the visible class, there will be multiple indirections.

8. This completely hides the pimpl class, but of course clients must still be recompiled if sizeofximpl changes.

9. All right, I'll fess up: There actually is a (not very portable, but pretty safe) way to do put the pimpl class right into the main class like this, thus avoiding all space and time overhead. It involves creating a "max_align" struct that guarantees maximal alignment, and defining the pimpl member as union { max_align dummy; char pimpl_[sizeofximpl]; }; -- this will guarantee sufficient alignment. For all the gory details, do a search for "max_align" on the web or on DejaNews. However, I still strongly urge you not to go down this path, because using a "max_align" solves only this first issue #1 and does not address issues #2 through #5. You Have Been Warned.

10. See H. Sutter, "Exception-Safe Generic Containers" (C++ Report, September 1997) and H. Sutter, "More Exception-Safe Generic Containers" (C++ Report, November/December 1997).

11. See GotW #23, and Advice From the C++ Experts in the October 1997 C++ Report.

Copyright © 2009 Herb Sutter