GotW #51

 On theblog November 4: Other Concurrency Sessions at PDC November 3 October 26: Hoare on Testing October 23

 This is the original GotW problem and solution substantially as posted to Usenet. See the book More Exceptional C++ (Addison-Wesley, 2002) for the most current solution to this GotW issue. The solutions in the book have been revised and expanded since their initial appearance in GotW. The book versions also incorporate corrections, new material, and conformance to the final ANSI/ISO C++ standard.

## Extending the Standard Library - Part I Difficulty: 7 / 10

This issue lets you test your standard algorithm skills. What does the standard library function remove() actually do, and how would you go about writing a generic function to remove only the third element in a container?

### Problem

#### JG Questions

1. What does the std::remove() function do? Be specific.

2. Write code that removes all values equal to 3 from a std::vector<int>.

#### Guru Question

3. A programmer working on your team wrote the following alternative pieces of code to remove the n-th element of a container.

```    //  Method 1: Write a special-purpose remove_nth
//            algorithm.
//
template<class FwdIter>
FwdIter remove_nth( FwdIter first, FwdIter last, size_t n )
{
/* ... */
}

//  Method 2: Write a functor which returns true
//            the nth time it's applied, and use
//            that as a predicate for remove_if.
//
template<class T>
class FlagNth
{
public:
FlagNth( size_t n ) : i_(0), n_(n) { }
bool operator()( const T& ) { return i_++ == n_; }
private:
size_t       i_;
const size_t n_;
};

// Example invocation
... remove_if( v.begin(), v.end(), FlagNth<int>(3) ) ...```

a) Implement the missing part of Method 1.

b) Which method is better? Why? Discuss anything that might be problematic about either solution.

### Solution

1. What does the std::remove() function do? Be specific.

std::remove() does not physically remove objects from a container; the size of the container is unchanged. Rather, remove() shuffles up the "unremoved" objects to fill in the gaps left by removed objects, leaving at the end one "dead" object for each removed object. Finally, remove() returns an iterator pointing at the first "dead" object (if no objects were removed, remove() returns the end iterator).

For example, consider a vector<int> v that contains the following:

`    1 2 3 1 2 3 1 2 3`

Say that you called remove( v.begin(), v.end(), 2 ). What would happen? The answer is something like this:

```    1 3 1 3 1 3 ? ? ?
^^^^^^^^^^^ ^^^^^
objects     objects

^
iterator returned by
remove() points to
the third-last object
(because three were
removed)```

Three objects had to be removed, and the rest were copied to fill in the gaps. The objects at the end of the container may have their original values (1 2 3), or they may not; don't rely on that. Again, note that the size of the container is left unchanged.

If you're wondering "why does remove() work that way?", see Andy Koenig's thorough treatment of this topic in the February 1999 C++ Report. [Disclaimer: I wrote this GotW issue several months ago but, this time, Andy beat me to publication... His column appeared on my desk within days of my getting around to posting GotW #51.]

2. Write code that removes all values equal to 3 from a std::vector<int>.

Here's a one-liner to do it, where v is a vector<int>:

`    v.erase( remove( v.begin(), v.end(), 3 ), v.end() );`

The call to remove( v.begin(), v.end(), 3 ) does the actual work, and returns an iterator pointing to the first "dead" element. The call to erase() from that point until v.end() shrinks the vector so that it contains only the unremoved objects.

3. A programmer working on your team wrote the following alternative pieces of code to remove the n-th element of a container.

```    //  Method 1: Write a special-purpose remove_nth
//            algorithm.
//
template<class FwdIter>
FwdIter remove_nth( FwdIter first, FwdIter last, size_t n )
{
/* ... */
}

//  Method 2: Write a functor which returns true
//            the nth time it's applied, and use
//            that as a predicate for remove_if.
//
template<class T>
class FlagNth
{
public:
FlagNth( size_t n ) : i_(0), n_(n) { }
bool operator()( const T& ) { return i_++ == n_; }
private:
size_t       i_;
const size_t n_;
};

// Example invocation
... remove_if( v.begin(), v.end(), FlagNth<int>(3) ) ...```

a) Implement the missing part of Method 1.

People often propose implementations that have the same bug as the following code. Did you?

```    //  Can you see the problem(s)?
//
template<class FwdIter>
FwdIter remove_nth( FwdIter first, FwdIter last, size_t n )
{
for( ; n > 0; ++first, --n ) ;
if( first != last )
{
FwdIter dest = first;
return copy( ++first, last, dest );
}
return last;
}```

There is one problem here, and the one problem hath two aspects:

1. The initial loop may move first past last, and then [first,last) is no longer a valid iterator range. If so, then in the remainder of the function, Bad Things(tm) will happen.

2. It would be fine to leave that potential problem in the function, as long as you also documented a precondition that n be valid for the given range. But this opens you up to some further criticism:

a) Not all who proposed such code documented the precondition.

b) If you're going to do that, you should dispense with the iterator-advancing loop and simply write advance( first, n ). The standard advance() function for iterators is already aware of iterator categories, and is automatically optimized for random-access iterators.

Here is a reasonable implementation:

```    //  Preconditions:
//  - first != last
//  - n must not exceed the size of the range
//
template<class FwdIter>
FwdIter remove_nth( FwdIter first, FwdIter last, size_t n )
{
if( first != last )
{
FwdIter dest = first;
return copy( ++first, last, dest );
}
return last;
}```

b) Which method is better? Why? Discuss anything that might be problematic about either solution.

Method 1 has two main advantages:

1. It is correct.

2. It can take advantage of iterator traits, specifically the iterator category, and so can perform better for random-access iterators.

Method 2 has corresponding disadvantages, which we'll analyze in detail in the second part of this two-part GotW.