Balanced Binary Search Trees:Schemes with no Balance Information

Schemes with no Balance Information

As discussed above, a balanced binary search tree is typically maintained by local constraints on the structure of the tree. By keeping structure information in the nodes, these constraints can be maintained during updates.

In this section, we show that a plain vanilla tree, without any local balance information, can be maintained eﬃciently. This can be done by coding the balance information implicitly (Section 10.6.1) or by using global instead of local balance criteria, hereby avoiding the need for balance information (Section 10.6.2). Splay trees [70] also have no balance information. They do not have a sub-linear bound on their height, but still perform searches in amortized O(log n) time. Splay trees are described in a chapter of their own in this book.

Implicit Representation of Balance Information

One idea of how to remove the need for local balance information is to store the information implicitly. There are two main techniques for this: coding information in the way empty pointers are located or coding information by changing the order between left and right children.

In both cases, we can easily code one bit implicitly at each internal node, but not at external nodes. Therefore, we weed to use balance schemes that can do with only one bit per internal node and no balance information at external nodes.

As an example, we may use the AVL-tree. At each node, we need to keep track of whether the two subtrees have the same height or if one of them is one unit higher than its sibling. We can do this with one bit per internal node by letting the bit be 1 if and only if the node is higher than its sibling. For external nodes we know the height, so no balance information is needed there.

The assumption that we only need one bit per internal node is used in the two construc- tions below.

Using Empty Pointers

As pointed out by Brown [24, 25], the explicitly stored balance information may in some classes of balanced trees be eliminated by coding the information through the location of empty pointers. We use a tree of pseudo-nodes, where a pseudo-node contains two consecutive elements, stored in two binary nodes. The pseudo-node will have three outgoing pointers, and since the two binary nodes are consecutive, one of the three pointers will be

empty. By varying which of the two nodes become parent, we can arrange the pseudo- node in two ways. These two diﬀerent structures is used to represent bit values 0 and 1, respectively; by checking the position of the empty pointer, we can compute the bit value. In order for this to work, we allow the pseudo-nodes at the bottom of the tree to contain one or two binary nodes.

During insertion, we traverse down the tree. If the inserted element lies between the two keys in a visited pseudo-node, we replace it by one of the elements in the pseudo-node and continue down the tree with that element instead. At the bottom of the tree, if we ﬁnd a pseudo-node with only one key, we just add the new key. If, on the other hand, we ﬁnd a pseudo-node with two keys, we split it into two pseudo-nodes which will cause an insertion in the tree of pseudo-nodes. Rotations etc. can be done with pseudo-nodes instead of ordinary binary nodes. (If a rotation involves the lowest level of the tree of pseudo-nodes, some care has to be taken in order to maintain the invariant that only the lowest pseudo-nodes may contain a single node.)

Deletions are handled correspondingly. If the deleted element is contained in an internal pseudo-node, we replace it by its predecessor or successor, which resides at the bottom of the tree; in this way we ensure that the deletion occurs at the bottom. If the deletion occurs at a pseudo-node with two binary nodes, we just remove the node, if the pseudo-node contains only one node, a deletion occurs in the tree of pseudo-nodes.

Despite the pseudo-nodes, the tree is really just a binary search tree where no balance information is explicitly stored. Since each pseudo-node has internal height 2, and the number of pseudo-nodes is less than n, the height of the binary tree is O(log n). A drawback is that the height of the underlying binary tree will become higher by the use of pseudo- nodes. Instead of n internal nodes we will have roughly n/2 pseudo-nodes, each of height 2. In the worst case, the height of the binary tree will be doubled.

Swapping Pointers

Another possibility for coding information into a structure is to use the ordering of nodes. If we redeﬁne binary search trees, such that the left and right subtree of a node are allowed to change place, we can use this possibility to encode one bit per node implicitly. By comparing the keys of the two children of a node, the one-bit information can be extracted. During search, we have to make one comparison extra at each node. This idea has been used by Munro and Suwanda [59–61] to achieve implicit implementation of binary search trees, but it can of course also be used for traditional pointer-based tree structures.

General Balanced Trees

In the following, we use |T | to denote the weight (number of leaves) in a tree T . We also use |v| to denote the weight of a subtree rooted at node v. It should be noted that for a tree T storing n keys in internal nodes, |T | = n + 1

Instead of coding balance information into the structure of the tree, we can let the tree take any shape, as long as its height is logarithmic. Then, there is no local balance criterion to maintain, and we need no balance information in the nodes, not even implicitly coded. As we show below, the tree can still be maintained eﬃciently.

When maintaining trees this way, we use the technique of partial rebuilding. This technique was ﬁrst introduced by Overmars and van Leeuwen [68, 69] for maintaining weight- balanced trees. By making a partial rebuilding at node v, we mean that the subtree rooted at v is rebuilt into a perfectly balanced tree. The cost of such rebalancing is Θ(|v|). In Section 10.5, we discuss linear time algorithms for rebalancing a (sub-)tree to perfect balance.

Apart from the advantage of requiring no balance information in the nodes, it can be shown [7] that the constant factor for general balanced trees is lower than what has been shown for the maintenance of weight-balanced trees by partial rebuilding.

The main idea in maintaining a general balanced tree is to let the tree take any shape as long as its height does not exceed log |T | by more than a speciﬁed constant factor. The key observation is that whenever the tree gets too high by an insertion, we can ﬁnd a node where partial rebuilding can be made at a low amortized cost. (Since deletions do not increase the height of the tree, we can handle deletions eﬃciently by rebuilding the entire tree after a large number of elements have been deleted.)

We use two constants c> 1, and b > 0, and we maintain a balanced tree T with maximum height Ic log |T | + bl.

No balance information is used, except two global integers, containing |T |, the number of leaves in T , and d(T ), the number of deletions made since the last time the entire tree T was rebalanced.

Updates are performed in the following way:

Insertion: If the depth of the new leaf exceeds Ic log(|T |+d(T ))l, we back up along the insertion path until we ﬁnd the lowest node v, such that h(v) > Ic log |v|l. The subtree v is then rebuilt to perfect balance. The node v is found by explicitly traversing the subtrees below the nodes on the path from the inserted leaf to v, while counting the number of leaves. The cost for this equals the cost for

traversing the subtree below v once, which is O(|v|).

Deletion: d(T ) increases by one. If d(T ) ≥ (2b/c − 1)|T |, we rebuild T to perfect balance and set d(T ) = 0.

First, we show that the height is low enough. Since deletions do not increase the height of T , we only need to show that the height is not increased too much by an insertion. We prove this by induction. Assume that

holds before an insertion. (Note that the height of an empty tree is zero.) During the insertion, the height condition can only be violated by the new node. However, if such a violation occurs, the partial rebuilding will ensure that Inequality 10.1 holds after the insertion. Hence, Inequality 10.1 holds by induction. Combining this with the fact that d(T ) < (2b/c − 1)|T |, we get that h(T ) ≤ Ic log |T | + bl.

Next, we show that the maintenance cost is low enough. Since the amortized cost for the rebuilding of the entire tree caused by deletions is obviously O(1) per deletion, we only need to consider insertions.

In fact, by the way we choose where to perform rebuilding, we can guarantee that when a partial rebuilding occurs at node v, Ω(v) updates have been made below v since the last time v was involved in a partial rebuilding. Indeed, this observation is the key observation behind general balanced trees.

Let vH be v’s child on the path to the inserted node. By the way v is selected by the algorithm, we know the following about v and vh:

Since 2−1/c > 1/2, we conclude that the weight of vH is Θ(v) larger than the weight of v’s other child. The only way this diﬀerence in weight between the two children can occur is by insertions or deletion below v. Hence, Ω(v) updates must have been made below v since the last time v was involved in a partial rebuilding. In order for the amortized analysis to hold, we need to reserve a constant cost at v for each update below v. At each update, updates are made below O(log n) nodes, so the total reservation per update is O(log n).

Since the tree is allowed to take any shape as long as its height is low enough, we call this type of balanced tree general balanced trees [7]. We use the notation GB-trees or GB(c)- trees, where c is the height constant above. (The constant b is omitted in this notation.) (The idea of general balanced trees have also been rediscovered under the name scapegoat trees [33].)

Example. The upper tree in Figure 10.5 illustrates a GB(1.2)-tree where ﬁve deletions and some insertions have been made since the last global rebuilding. When inserting 10, the height becomes 7, which is too high, since 7 > Ic log(|T |+d(T ))l = I1.2 log(20+5)l = 6.

We back up along the path until we ﬁnd the node 14. The height of this node is 5 and the weight is 8. Since 5 > I1.2 log 8l, we can make a partial rebuilding at that node. The resulting tree is shown as the lower tree in Figure 10.5.

Application to Multi-Dimensional Search Trees

The technique of partial rebuilding is an attractive method in the sense that it is useful not only for ordinary binary search trees, but also for more complicated data structures, such as multi-dimensional search trees, where rotations cannot be used eﬃciently. For example, partial rebuilding can be used to maintain logarithmic height in k-d trees [19] under updates [57, 68, 69]. A detailed study of the use of partial rebuilding can be found in Mark Overmars’ Ph.D. thesis [68]. For the sake of completeness, we just mention that if the cost of rebalancing a subtree v is O(P (|v|)), the amortized cost of an update will . For example, applied to k-d trees, we get an amortized update cost of

Search This Blog

algorithms