Data Structures in C++:Introduction and Basic Containers
In C++, several classic data structures are implemented as a part of the standard library, commonly known as the Standard Template Library. The Standard Template Library (or simply, the STL) consists of a set of container classes (such as lists, sets, and maps), iterator classes that are used to traverse the container classes, and generic algorithms, such as sorting and searching. As its name implies, the library consists of (both function and class) templates. The STL is very powerful, and makes some use of advanced C++ features. Our discussion is focused on the most basic uses of the STL.
We partition our discussion of the Standard Template Library into the following sections.
1. Containers.
2. Iterators.
3. Generic Algorithms.
See [3–9] for additional material on the Standard Template Library.
Basic Containers
The STL defines several container templates that store collections of objects. Some collections are unordered, while others are ordered. Some collections allow duplicates, while others do not. All containers support the following operations:
• int size( ) const: returns the number of elements in the container.
• void clear( ): removes all elements from the container.
• bool empty( ) const: returns true if the container contains no elements and false otherwise.
There is no universal add or insert member function; different containers use different names.
The container classes can be split into three broad categories:
1. Sequence containers: maintains items with a notion of a position in the collection.
Examples include vector, list, and deque.
2. Sorted associative containers: maintains items in sorted order. Examples include set, multiset, map, and multimap.
3. Container adapters: built on top of other containers to yield classic data structures. Examples include stack, queue, and priority queue.
Associated with all containers is an iterator. An iterator represents the notion of a position in the container and is used to step through the container. All containers support the following operations:
• iterator begin( ): returns an appropriate iterator representing the first item in the container.
• iterator end( ): returns an appropriate iterator representing the end marker in the container (i.e. the position after the last item in the container).
We defer the discussion of iterators to Section 42.3.
The three basic sequence containers in the STL are the vector, list, and deque.
vector is a growable array. The vector wraps an internal array, maintaining a notion of its size, and internally its current capacity. If a sequence of additions would cause the size to exceed capacity, the capacity is automatically doubled. This makes additions at the end of the vector take constant amortized time. list is a doubly-linked list, in which pointers to both ends are maintained. deque is, in effect, a growable array that grows at both ends. There are several well-known ways of implement deque efficiently, but the standard is silent on which must be used.
For vector, an insertion or deletion takes amortized time that is proportional to the distance from the back, while for a deque, these operations take amortized time that is proportional to the smaller of the distances from the front and back. For a list, these operations take worst-case time that is proportional to the smaller of the distances from the front and back if an index is specified, but constant worst-case time if the position is specified by an existing iterator. vector and deque support indexing in constant worst-case time; list does not.
The basic operations that are supported by both containers are:
• void push back( const Object & x ): adds x to the end of the container.
• Object & back( ): returns the object at the end of the container (an accessor that returns a constant reference is also provided).
• void pop back( ): removes the object at the end of the container.
• Object & front( ): returns the object at the front of the container (an accessor that returns a constant reference is also provided).
• iterator insert( iterator pos, const Object & x ): adds x into the container, prior to the position given by the iterator pos. This is a constant-time operation for list, but not for vector or deque. The return value is an iterator representing the position of the inserted item.
• iterator erase( iterator pos ): removes the object at the position given by the iterator. This is a constant-time operation for list, but not vector or deque. The return value is the position of the element that followed pos prior to the call. This operation invalidates pos, which is now stale.
• iterator erase( iterator start, iterator end ): removes all items be- ginning at position start, up to but not including end. Observe that an entire container can be erased by the call: c.erase( c.begin( ), c.end( ) ).
For deque and list, two additional operations are available with expected semantics:
• void push front( const Object & x ): adds x to the front of the container.
• void pop front( ): removes the object at the front of the container.
The list also provides a splice operation that allows the transfer of a sublist to some- where else.
• void splice( iterator pos, list & l, iterator start, iterator end ): moves all items beginning at position start, up to but not including end to after pos. The moved items are assumed to have come from list l. The running time of this operation is proportional to the number of items moved, unless the items are moved within the same list (i.e. &l==this), in which case the running time is constant-time.
For vector and deque, additional operations include
• Object & operator[] ( int idx ) returns the object at index idx in the container, with no bounds-checking (an accessor that returns a constant reference is also provided).
• Object & at( int idx ): returns the object at index idx in the container, with bounds-checking (an accessor that returns a constant reference is also provided).
• int capacity( ) const: returns the internal capacity.
• void reserve( int newCapacity ): for vector only, sets the new capacity. If a good estimate is available, it can be used to avoid array expansion.
Sorted Associative Containers
The STL provides two basic types of associative containers. Sets are used to store items. The multiset allows duplicates, while the set does not. Maps are used to store key/value pairs. The multimap allows duplicate keys, while the map does not. For sets, the items are logically maintained in sorted order, so an iterator to the “beginning item” represents the smallest item in the collection. For maps, the keys are logically maintained in sorted order. As a result, the most natural implementation of sets and maps make use of balanced binary search trees; typically a top-down red black tree (Chapter 10) is used.
The significant liability is that basic operations take logarithmic worst-case time. Some STL implementations (such as SGI’s version) provide hash set and hash map, which make use of hash tables (Chapter 9) and provide constant average time per basic operation. A hash container will eventually make its way into the library, but at the time of this writing it is not Standard C++.
C++ sets and maps use either a default ordering (operator<) or one provided by a function object. In C++ a function object is implemented by providing a class that contains an overloaded operator(), and then instantiating a template with the class name as a template parameter (an example of this is shown later in Figure 42.1).
Both sets and maps make use of the pair class template. The pair class template, stores two data members first and second, which can be directly accessed, without invoking member functions. Internally, maps store items of type pair. Additionally, several set member functions need to return two things; this is done by returning an appropriately instantiated pair.
Sets and Multisets
The set provides several member functions in addition to the usual suspects, including:
• pair<iterator,bool> insert( const Object & x ): adds x to the set. If x is not already present, the returned pair will contain the iterator representing the already contained x, and false. Otherwise, it will contain the iterator rep- resenting the newly inserted x, and true.
• pair<iterator,bool> insert( iterator hint, const Object & x ): same behavior as the one-parameter insert, but allows specification of a hint, repre- senting the position where x should go. If the underlying implementation is a finger search tree (Chapter 11), or equivalent, the performance could be better than using the one-parameter insert.
• iterator find( const Object & x ) const: returns an iterator representing the position of x in the set. If x is not found, the endmarker is returned.
• int erase( const Object & x ): removes x (if present) and returns the number of items removed, which is either 0 or 1 in a set, and perhaps larger in a multiset.
• iterator erase( iterator pos ): same behavior as the sequence container version.
• iterator erase( iterator start, iterator end ): same behavior as the sequence container version.
• iterator lower bound( const Object & x ): returns an iterator to the first element in the set with a key that is greater than or equal to x.
• iterator upper bound( const Object & x ): returns an iterator to the first element in the set with a key that is greater than x.
• pair<iterator,iterator> equal range( const Object & x ): returns a pair of iterators representing lower bound and upper bound.
A multiset is like a set except that duplicates are allowed. The return type of insert
is modified to indicate that the insert always succeeds:
• iterator insert( const Object & x ): adds x to the multiset; returns an iterator representing the newly inserted x.
• iterator insert( iterator hint, const Object & x ): adds x to the multiset; returns an iterator representing the newly inserted x. Performance might be enhanced if x is inserted close to the position given by hint.
For the multiset, the erase member function that takes an Object x removes all occurrences of x. To simply remove one occurrence, first call find to obtain an iterator, and then use the erase member function that takes an iterator.
In a multiset, to find all occurrences of x, we cannot simply call find; that returns an iterator referencing one occurrence (if there is one), but which specific occurrence is
returned is not guaranteed. Instead, the range returned by lower bound and upper bound (with upper bound not included) contains all of occurrences of x; typically this is obtained by a call to equal range.
Figure 42.1 illustrates two sets: s1 which stores strings using the normal case-sensitive ordering, and s2 which stores strings using case-insensitive comparisons. s2 is instantiated by providing the class of a function object as a template parameter. As a result, s1 contains three strings, but s2 considers "hello" and "HELLO" to be identical, and thus only stores one of them.
Maps and Multimaps
A map behaves like a set instantiated with a pair representing a key and value, with a comparison function that refers only to the key. Thus it supports all of the set operations. The find operation for maps requires only a key, but the iterator that it returns references a pair. Similarly, erase requires only a key, and otherwise behaves like the set’s erase. insert is supported, but to use insert, we must insert a properly instantiated pair, which is cumbersome, and thus rarely done. Instead, the map overloads the array indexing
operator[]:
• Value Type & operator[] ( const KeyType & key ): if the key is present in the map, a reference to the value is returned. If the key is not present in the map, it is inserted with a default value into the map, and then a reference to the inserted default value is returned. The default value is obtained by applying a zero-parameter constructor, or is zero for the primitive types.
The semantics of operator[] do not allow an accessor version of operator[], and so operator[] cannot be used on an immutable map. For instance, if a map is passed by constant reference, inside the routine, operator[] is unusable. In such a case, we can use
find to obtain an iterator, test if the iterator represents the end marker (in which case the find has failed), and if the iterator is valid, we can use it to access the pair’s second component, which is the value. (This alternate idiom is eventually seen in the larger example in Section 42.5).
A multimap is a map in which duplicate keys are allowed. multimaps behave like maps but do not support operator[].
Figure 42.2 illustrates a combination of map and vector in which the keys are email aliases, and the values are the email addresses corresponding to each alias. Since there can be several addresses for each alias, the values are themselves vectors of strings. The trickiest part (other than the various parameter-passing modes) concerns the one-line implementation of add Alias. There, we see that alias Map[name] refers to the vector value in the map for the name key. If name is not in the map, then the call to operator[] causes it to be placed into the map, with a default value being a zero-sized vector. Thus whether or not name was in the map prior to calling add Alias, after the access of the map, the call to push back updates the vector value correctly.
The STL provides adapter class templates to implement stacks, queues, and priority queues. These class templates are instantiated with the type of object to be stored and (optionally, if the default is unacceptable) the container class (such as vector, list, or deque) that is used to store the objects. Thus we can specify a linked list or array implementation for a stack, though in reality, this feature does little more than add nicer names than push back to the (existing list or vector, respectively) interface.
stack and queue
For stack, the member functions are push, pop, and top. For queue, we get push, front, and pop. By default, a vector is used for stack and a deque for queue. Figure 42.3 shows how to use the queue adapter, implemented with a doubly-linked list.
priority queue
The priority queue template contains member functions named push, top, and pop, that mirror insert, find max, and delete max in a classic max-heap.
Sometimes priority queues are set up to remove and access the smallest item instead of the largest item. In such a case, the priority queue can be instantiated with an appropriate greater function object to override the default ordering. The priority queue template is instantiated with an item type, the container type (as in stack and queue), and the comparator, with defaults allowed for the last two parameters. In Figure 42.4, the default instantiation of priority queue allows access to the largest items, whereas an instantiation with a greater function object reverses the comparison order and allows access to the smallest item.
Comments
Post a Comment