Concurrent Data Structures:Search Trees

Search Trees

A concurrent implementation of any search tree (See Chapters 3 and 10 for more on sequential search trees) can be achieved by protecting it using a single exclusive lock. Concurrency can be improved somewhat by using a reader-writer lock to allow all read-only (search) operations to execute concurrently with each other while holding the lock in shared mode, while update (insert or delete) operations exclude all other operations by acquiring the lock in exclusive mode. If update operations are rare, this may be acceptable, but with even a moderate number of updates, the exclusive lock for update operations creates a sequential bottleneck that degrades performance substantially. By using fine-grained locking strategies—for example by using one lock per node, rather than a single lock for the entire tree—we can improve concurrency further.

Kung and Lehman [78] present a concurrent binary search tree implementation in which update operations hold only a constant number of node locks at a time, and these locks only exclude other update operations: search operations are never blocked. However, this implementation makes no attempt to keep the search tree balanced. In the remainder of this section, we focus on balanced search trees, which are considerably more challenging.

As a first step towards more fine-grained synchronization in balanced search tree implementations, we can observe that it is sufficient for an operation to hold an exclusive lock on the subtree in which it causes any modifications. This way, update operations that modify disjoint subtrees can execute in parallel. We briefly describe some techniques in this spirit in the context of B+-trees. Recall that in B+-trees, all keys and data are stored in leaf nodes; internal nodes only maintain routing information to direct operations towards the appropriate leaf nodes. Furthermore, an insertion into a leaf may require the leaf to be split, which may in turn require a new entry to be added to the leaf’s parent, which itself may need to be split to accommodate the new entry. Thus, an insertion can potentially result in modifying all nodes along the path from the root to the leaf. However, such behavior is rare, so it does not make sense to exclusively lock the whole path just in case this occurs.

As a first step to avoiding such conservative locking strategies, we can observe that if an insert operation passes an internal B+-tree node that is not full, then the modifications it makes to the tree cannot propagate past that node. In this case, we say that the node is safe with respect to the insert operation. If an update operation encounters a safe node while descending the tree acquiring exclusive locks on each node it traverses, it can safely release the locks on all ancestors of that node, thereby improving concurrency by allowing other operations to traverse those nodes [98, 120]. Because search operations do not modify the tree, they can descend the tree using lock coupling: as soon as a lock has been acquired on a child node, the lock on its parent can be released. Thus, search operations hold at most two locks (in shared mode) at any point in time, and are therefore less likely to prevent progress by other operations.

This approach still requires each update operation to acquire an exclusive lock on the root node, and to hold the lock while reading a child node, potentially from disk, so the root is still a bottleneck. We can improve on this approach by observing that most update operations will not need to split or merge the leaf node they access, and will therefore eventually release the exclusive locks on all of the nodes traversed on the way to the leaf.

This observation suggests an “optimistic” approach in which we descend the tree acquiring the locks in shared mode, acquiring only the leaf node exclusively [14]. If the leaf does not need to be split or merged, the update operation can complete immediately; in the rare cases in which changes do need to propagate up the tree, we can release all of the locks and then retry with the more pessimistic approach described above. Alternatively, we can use reader-writer locks that allow locks held in a shared mode to be “upgraded” to exclusive mode. This way, if an update operation discovers that it does need to modify nodes other than the leaf, it can upgrade locks it already holds in shared mode to exclusive mode, and avoid completely restarting the operation from the top of the tree [14]. Various combinations of the above techniques can be used because nodes near the top of the tree are more likely to conflict with other operations and less likely to be modified, while the opposite is true of nodes near the leaves [14].

As we employ some of the more sophisticated techniques described above, the algorithms become more complicated, and it becomes more difficult to avoid deadlock, resulting in even further complications. Nonetheless, all of these techniques maintain the invariant that operations exclusively lock the subtrees that they modify, so operations do not encounter states that they would not encounter in a sequential implementation. Significant improvements in concurrency and performance can be made by relaxing this requirement, at the cost of making it more difficult to reason that the resulting algorithms are correct.

A key difficulty we encounter when we attempt to relax the strict subtree locking schemes is that an operation descending the tree might follow a pointer to a child node that is no longer the correct node because of a modification by a concurrent operation. Various techniques have been developed that allow operations to recover from such “confusion”, rather than strictly avoiding it.

An important example in the context of B+-trees is due to Lehman and Yao [90], who define Blink -trees: B+-trees with “links” from each node in the tree to its right neighbor at the same level of the tree. These links allow us to “separate” the splitting of a node from modifications to its parent to reflect the splitting. Specifically, in order to split a node n, we can create a new node nl to its right, and install a link from n to nl. If an operation that is descending the tree reaches node n while searching for a key position that is now covered by node nl due to the split, the operation can simply follow the link from n to nl to recover. This allows a node to be split without preventing access by concurrent operations to the node’s parent. As a result, update operations do not need to simultaneously lock the entire subtree they (potentially) modify. In fact, in the Lehman-Yao algorithm, update operations as well as search operations use the lock coupling technique so that no operation ever holds more than two locks at a time, which significantly improves concurrency. This technique has been further refined, so that operations never hold more than one lock at a time [119]. Lehman and Yao do not address how nodes can be merged, instead allowing delete oper- ations to leave nodes underfull. They argue that in many cases delete operations are rare, and that if space utilization becomes a problem, the tree can occasionally be reorganized in “batch” mode by exclusively locking the entire tree. Lanin and Shasha [83] incorporate merging into the delete operations, similarly to how insert operations split overflowed nodes in previous implementations. Similar to the Lehman-Yao link technique, these imple- mentations use links to allow recovery by operations that have mistakenly reached a node that has been evacuated due to node merging.

In all of the algorithms discussed above, the maintenance operations such as node splitting and merging (where applicable) are performed as part of the regular update operations.

Without such tight coupling between the maintenance operations and the regular operations that necessitate them, we cannot guarantee strict balancing properties. However, if we relax the balance requirements, we can separate the tree maintenance work from the update operations, resulting in a number of advantages that outweigh the desire to keep search trees strictly balanced. As an example, the Blink -tree implementation in [119] supports a compression process that can run concurrently with regular operations to merge nodes that are under full. By separating this work from the regular update operations, it can be performed concurrently by threads running on different processors, or in the background.

The idea of separating rebalancing work from regular tree operations was first suggested for red-black trees [40], and was first realized in [71] for AVL trees [2] supporting insert and search operations. An implementation that also supports delete operations is provided in [108]. These implementations improve concurrency by breaking balancing work down into small, local tree transformations that can be performed independently. Analysis in [84] shows that with some modifications, the scheme of [108] guarantees that each update operation causes at most O(log N ) rebalancing operations for an N -node AVL tree. Similar results exist for B-trees [87, 108] and red-black trees [20, 107].

The only nonblocking implementations of balanced search trees have been achieved using Dynamic Software Transactional Memory mechanisms [33, 54]. These implementations use transactions translated from sequential code that performs rebalancing work as part of regular operations.

The above brief survey covers only basic issues and techniques involved with implementing concurrent search trees. To mention just a few of the numerous improvements and extensions in the literature, [104] addresses practical issues for the use of B+-trees in commercial database products, such as recovery after failures; [74] presents concurrent implementations for generalized search trees (GiSTs) that facilitate the design of search trees without repeat- ing the delicate work involved with concurrency control; and [85, 86] present several types of trees that support the efficient insertion and/or deletion of a group of values. Pugh [111] presents a concurrent version of his skiplist randomized search structure [112]. Skiplists are virtual tree structures consisting of multiple layers of linked lists. The expected search time in a skiplist is logarithmic in the number of elements in it. The main advantage of skiplists is that they do not require rebalancing: insertions are done in a randomized fashion that keeps the search tree balanced.

Empirical and analytical evaluations of concurrent search trees and other data structures can be found in [41, 66].

Comments

Popular posts from this blog

Data Structure Visualization:Introduction and Value of Data Structure Rendering

Collision Detection:Penetration Depth Computation

Concurrent Data Structures:Linked Lists