Standard Library News, Part 1: Vectors and Deques
This article appeared in C++ Report, 11(7), July/August 1999.
NOTE: The "In Most Cases, Prefer Using deque" section in this latest revision was significantly expanded after the article went to press.
This column and the next will present news updates about the standard library containers vector, deque, set, multiset, map, and multimap.
"Updates?" you might ask. "What updates? Isn't the C++ Standard done?" Yes, the C++ Standard is done, and in this "maintenance" phase the ANSI/ISO standards committees are still meeting (albeit only twice a year) to produce defect reports and clarifications in response to questions about the Standard. Why should you care about these results? Because many of the clarifications include useful information about how to use C++ effectively, as we shall see in this column and the next one.
This time, my focus is on news about std::vector and std::deque. This column will answer the following questions:
o Can you use a vector interchangeably with a C-style array (say, with an existing function that expects an array)? How? What are the issues?
o Why should you usually prefer using a deque instead of a vector?
o How do you shrink a vector whose capacity has grown larger than you need? In particular, how do you "shrink-to-fit" (so that the vector's size is just big enough to hold only its current contents) or clear a vector's internal storage entirely (so that the vector is truly empty, with no contents and little or no under-the-covers preallocated storage)? Can the same technique be used for other containers?
o What's happening with std::vector<bool>, whose problems I discussed in the previous column?
Using a vector As an Array: A Motivating Example
C++ programmers are strongly encouraged to use the std::vector template instead of C-style arrays. Using a vector is easier and safer than using an array, because a vector is a better-abstracted and better-encapsulated form of container; for example, a vector can grow and shrink as needed whereas an array has a fixed size.
Nonetheless, there's still a lot of code out there that is designed to work with C-style arrays. For example, consider the following code:
// Example 1(a): A function that operates
To use the above function we might create, populate, and pass in an array as follows. Note that we could equivalently pass just c as the second parameter, which would be the same as passing &c because the array name would automatically be converted into a pointer to the first element in the array:
// Example 1(b): Using an array
//-- populate the contents of c somehow ---
int i = FindCustomer( "Fred Jones", &c, 100 );
Example 1(b) is all well and good, but as modern C++ programmers aren't we supposed to be using vectors instead of arrays? Indeed, wouldn't it be nice if we could have the best of both worlds: use a vector for its convenience and safety, yet still use the vector's contents with functions that expect an array? This desire comes up most often when an organization is migrating to C++, and people want to write new parts of the system in the "new and better" way while still tying in easily with a legacy code base. So let's use a vector... our first attempt could look something like this:
// Example 1(c): Using a vector -- will this work?
//-- populate the contents of c somehow ---
int i = FindCustomer( "Fred Jones", &c, c.size() );
In Example 1(c), when it comes time to call FindCustomer(), the idea is to pass in the address of the first element, and the number of elements--which is pretty much just what we did in Example 1(b).
Before reading on, stop for a moment and consider these questions: What potential benefits does the code in Example 1(c) have over the code in Example 1(b)? Do you see any potential problems with Example 1(c), or do you think that it will work as intended?
A Fly In the Ointment
I didn't just pull Example 1(c) out of a hat. It turns out that there are lot of real-world C++ programmers who have done exactly this kind of thing, and with good reason, because code like that in Example 1(c) illustrates some of the important benefits that come from using a vector:
o We don't have to know the potential size of c in advance. In many C programs, you'll see arrays being allocated pessimistically so that they'll be big enough for most intended uses; unfortunately, this means that space is being needlessly wasted much of the time when the full size isn't needed after all. (Worse, it means the program breaks if it ever end up needing more space than we'd thought.) With a C++ vector, we can grow the container as needed instead of just guessing. Also, if we do happen to know in advance how many objects we're going to put into the vector, we can simply say so (by first calling c.reserve(100), if we want to reserve space for 100 elements), so there's no performance disadvantage with a vector compared to an array because we have a way to avoid repeated reallocations as we insert new elements into the vector.
o We don't have to separately track the actual length of the array so that we can pass it as the final (nLength) parameter to FindCustomer(). Yes, you can use a C array and use workarounds that make remembering or recalculating the length easier, but none of those workarounds are as easy and safe as just writing c.size().
In short, Example 1(c) indeed works on every standard library implementation that I know about, and is likely to work on all that exist today.
The only (minor, and as you'll soon see, pretty much moot) fly in the ointment is that the C++ Standard as it's currently written doesn't guarantee that Example 1(c) will actually work portably on every C++ platform you might buy. The reason is pretty simple: Example 1(c) assumes that the internal contents of the vector are stored contiguously in the same format as is an array, but today's Standard doesn't require vendors to implement vector that way. If, for example, a vector implementation were to play games and store its contents in some other way (such as contiguously but in backwards order, or in sequential order but with an extra byte of housekeeping information between elements), then Example 1(c) would in fact fail miserably.
Having said that, here's the good news: I believe that we can now safely say that Example 1(c) is entirely safe and portable, for the following reasons:
o It should work in practice today. Again, I don't know of a commercial vector implementation that doesn't store its elements contiguously.
o It will be guaranteed to work in practice tomorrow. The standards committee agrees that this is a natural thing to want to do, and it intends to put wording in the first Technical Corrigendum to the effect that the elements inside a vector must indeed be stored contiguously and so will be usable with code that expects an array. The library vendors know that this clarification is coming (after all, the major standard library vendors were with us in the room when this was being discussed), and so they will make sure that new versions of their vector implementations won't introduce any incompatibilities that would make them nonconforming in light of this clarification.
Bottom line: Prefer using std::vector (or std::deque; see below) instead of arrays.
In Most Cases, Prefer Using deque
The C++ Standard, in section 23.1.1, offers some advice on which containers to prefer. It says:
I'd like to present an amiably dissenting point of view: I recommend that you consider preferring deque by default instead of vector, especially when the contained type is a class or struct and not a builtin type, unless you really need the container's memory to be contiguous.
vector and deque offer nearly identical interfaces and are generally interchangeable. deque also offers push_front() and pop_front(), which vector does not. (True, vector offers capacity() and reserve(), which deque does not, but that's no loss--those functions are actually a weakness of vector, as I'll demonstrate in a moment.)
The main structural difference between vector and deque lies in the way the two containers organize their internal storage: Under the covers, a deque allocates its storage in pages, or "chunks," with a fixed number of contained elements in each page; this why a deque is often compared to (and pronounced as) a "deck" of cards, although its name originally came from "double-ended queue" because of is ability to insert elements efficiently at either end of the sequence. On the other hand, a vector allocates a contiguous block of memory, and can only insert elements efficiently at the end of the sequence.
The paged organization of a deque offers significant benefits:
1. A deque offers constant-time insert() and erase() operations at the front of the container, whereas a vector does not--hence the note in the Standard about using a deque if you need to insert or erase at both ends of the sequence.
2. A deque uses memory in a more operating system-friendly way. For example, a 10-megabyte vector uses a single 10-megabyte block of memory; on some operating systems that single block can be less efficient in practice than a 10-megabyte deque that can fit in a series of smaller blocks of memory.
3. A deque is easier to use, and inherently more efficient for growth, than a vector. The only operations supplied by vector that deque doesn't have are capacity() and reserve()--and that's because deque doesn't need them! For vector, calling reserve() before a large number of push_back()s can eliminate reallocating ever-larger versions of the same buffer every time it finds out that the current one isn't big enough after all. A deque has no such problem, and having a deque::reserve() before a large number of push_back()s would not eliminate any allocations (or any other work) because none of the allocations are redundant; the deque has to allocate the same number of extra pages whether it does it all at once or as elements are actually appended.
Interestingly, the C++ Standard itself contains an indication that it disagrees with its own note about preferring to use a vector by default: Even for the stack adapter, which can only grow in one direction and so never needs insertion in the middle or at the other end, the default implementation must always be a deque:
template <class T, class Container = deque<T> >
Some readers are going to be concerned about my advice to prefer deque, saying perhaps: "But deque is a more complex container than vector, and so deque must be five or ten times less efficient compared to vector, right?" As always, beware premature optimization before you actually measure: I have found the "deque must be inefficient" assumption to be generally untrue on popular implementations. Using MSVC 5.0 SP3 with the current patches for its bundled Dinkumware C++ Standard Library implementation (probably by far the most widely-used compiler and library configuration), I tested the performance of the following operations, in order:
I tested each operation against several standard containers including deque. I expected vector to outperform deque in traverse and at, and deque to win in grow for reason #3 above. In fact, note that deque did outperform vector on the grow test even though in fairness I gave vector special treatment by calling vector::reserve(1000000) first to avoid any reallocations. All timings varied less than 5% across multiple runs, and all runs executed fully in-memory without paging or other disk accesses.
First, consider what happened for containers of a simple builtin type, int:
Times in ms grow traverse at shuffle sort