Balanced Binary Search Trees:Classic Balancing Schemes

Classic Balancing Schemes

AVL-Trees

AVL-trees where introduced in 1962 in [1], and are named after their inventors Adel’son Vel’ski and Landis. They proposed the ﬁrst dictionary structure with logarithmic search and update times, and also introduced the rebalancing technique using rotations.

The balance deﬁnition in AVL-trees is based on the height of subtrees. The invariant is that for any node, the heights of its two subtrees diﬀer by at most one. Traditionally, the balance information maintained at each node is +1, 0, or −1, giving the diﬀerence in heights between the right subtree and the left subtree. This information can be represented by two bits. Another method is to mark a node when its height is larger than its siblings.

This requires only one bit per node, but reading the balance of a node now involves visiting its children. In the other direction, storing the height of each node requires log log n bits of information per node, but makes the rebalancing algorithms simpler to describe and analyze.

By induction on h, it is easily proved that for an AVL-tree of height h, the minimum number of nodes is Fh+2 − 1, where Fi denotes the i’th Fibonacci number, deﬁned by F1 = F2 = 1 and Fj+2 = Fj+1 + Fj . A well-known fact for Fibonacci numbers is that Fi ≥ Φi−2, where Φ is the golden ratio (√5 + 1)/2 ≈ 1.618. This shows that the height of an AVL-tree with n nodes is at most logΦ(n + 1), i.e., AVL-trees have a height bound of the type c · log n with c = 1/ log Φ ≈ 1.440.

After an update, violations of the balance invariant can only occur at nodes on the search path from the root to the update point, as only these nodes have subtrees changed. The rebalancing algorithm resolves these in a bottom-up fashion. At each node, it either performs a rotation, performs a double rotation, or just updates balance information, with the choice depending on the balance of its child and grandchild on the search path. The algorithm stops when it can guarantee that no ancestor has a balance problem, or when the root is reached.

In AVL-trees, the rebalancing algorithm has the following properties: After an insertion, change of balance information may take place any number of steps towards the root, but as soon as a rotation or double rotation takes place, no further balance problems remain. Hence, only O(1) structural change is made. In contrast, after a deletion it may happen that rotations are performed at all nodes on the search path. If only insertions take place, the amortized amount of rebalancing work, including updating of balance information, can be shown [58] to be O(1). The same is true if only deletions take place [75]. It is not true in the fully dynamic case, as it is easy to ﬁnd an AVL-tree where alternating insertions and deletions of the same key require rebalancing along the entire search path after each update.

Weight-Balanced Trees

Weight-balanced trees were proposed in 1973 by Nievergelt and Reingold [62], and have a balance deﬁnition based on the sizes of subtrees. Here, the size of a subtree is most conveniently deﬁned as the number of external nodes (empty trees) in the subtree, and the size, also denoted the weight, of a node is the size of its subtree. The balance invariant of weight-balanced trees states that for any node, the ratio between its own weight and the weight of its right child (or left) is in the interval [ α, 1 − α ] for some ﬁxed value α > 0.

This ratio is denoted the balance of the node. Since a node of weight three must have subtrees of weight two and one, we must have α ≤ 1/3. Weight-balanced trees are also called BB[α]-trees, which stands for trees of bounded balance with parameter α.

By the balance criterion, for any node v the weight of the parent of v is at least a factor 1/(1 − α) larger than the weight of v. A tree of height k therefore has a root of weight at least 1/(1 − α)k , which shows that the height of a weight-balanced tree with n nodes is at most log1/(1−α)(n + 1), i.e., weight-balanced trees have a height bound of the type c · log n with c = −1/ log(1 − α) > 1.709.

The balance information stored in each node is its weight, for which log n bits are needed.

After an update, this information must be updated for all nodes on the search path from the root to the update point. Some of these nodes may now violate the balance criterion.

The rebalancing algorithm proposed in [62] resolves this unbalance in a bottom-up fashion along the search path using either a rotation or a double rotation at each violating node.

The choice of rotation depends on the weight of the children and the grandchildren of the node.

In [62], the rebalancing algorithm was claimed to work for α in the interval [ 0 , 1 − 1/√2 ], but Blum and Mehlhorn [20] later observed that the correct interval is (2/11 , 1 − 1/√2 ].

They also showed that for α strictly inside this interval, the rebalancing of an unbalanced node restores its balance to a value in [ (1 + δ)α, 1 − (1 + δ)α ], where δ depends on the choice of α. This implies that when the node becomes unbalanced again, the number of updates which have taken place below it since it was last rebalanced is at least a fraction (depending on α) of its current weight. This feature, unique to weight-balanced trees, has important applications, e.g., for data structures in Computational Geometry. A number of these structures are binary search trees where each node has an associated secondary structure built on the elements in the subtree of the node. When a rotation takes place, the structures of the nodes taking part in the rotation will have to be rebuilt. If we attribute the cost of this rebuilding evenly to the updates which have taken place below the node since it was last involved in a rotation, then, as an example, a linear rebuilding cost of the secondary structure will amount to a constant attribution to each of these updates. As the search path for an update contains O(log n) nodes, any single update can at most receive this many attributions, which implies an amortized O(log n) update complexity for the entire data structure.

The same analysis allows BB[α]-trees to be maintained by local rebuilding instead of rotations in amortized O(log n) time, as ﬁrst noted by Overmars and van Leeuwen [69]:

After an update, the subtree rooted at the highest unbalanced node (if any) on the search path is rebuilt to perfect balance. Since a rebuilding of a subtree leaves all nodes in it with balance close to 1/2, the number of updates which must have taken place below the node since it was last part of a rebuilding is a constant fraction of its current weight. The rebuilding uses work linear in this weight, which can be covered by attributing a constant amount of work to each of these updates. Again, each update is attributed O(log n) work.

This scheme will work for any α ≤ 1/3.

For the original rebalancing algorithm using rotations, a better analysis can be made for α chosen strictly inside the interval (2/11 , 1 − 1/√2 ]: The total work per rebalancing operation is now O(1), so the work to be attributed to each update below a node is O(1/w), where w is the weight of the node. As noted above in the proof of the height bound of weight-balanced trees, w is exponentially increasing along the search path from the update point to the root. This implies that each update is attributed only O(1) work in total, and also that the number of rotations taking place at a given height decreases exponentially with the height. This result from [20] seems to be the ﬁrst on O(1) amortized rebalancing in binary search trees. The actual time spent after an update is still logarithmic in weight- balanced trees, though, as the balance information needs to be updated along the entire search path, but this entails no structural changes.

Recently, the idea of balancing by weight has been applied to multi-way search trees [14], leading to trees eﬃcient in external memory which posses the same feature as weight- balanced binary trees, namely that between each rebalancing at a node, the number of updates which have taken place below the node is proportional to the weight of the node.

Balanced Binary Trees Based on Multi-Way Trees.

The B-tree [17], which is treated in another chapter of this book, is originally designed to handle data stored on external memory. The basic idea is to associate a physical block with a high-degree node in a multi-way tree. A B-tree is maintained by merging and splitting nodes, and by increasing and decreasing the number of layers of multi-way nodes. The smallest example of a B-tree is the 2-3-tree [2], where the nodes have degree 2 or 3. In a typical B-tree implementation, the degree of a node is much larger, and it varies roughly within a factor of 2.

The concept of multi-way nodes, splitting, and merging, has also proven to be very fruitful in the design of balancing schemes for binary trees. The ﬁrst such example is the binary B-tree [15], a binary implementation of 2-3-trees. Here, the idea is to organize binary nodes into larger chunks of nodes, here called pseudo-nodes. In the binary version of a 2-3-tree, a node of degree 2 is represented by one binary node, while a node of degree 3 is represented as two binary nodes (with the additional constraint that one of the two nodes is the right child of the other). In the terms of binary nodes grouped into pseudo-nodes, it is convenient to say that edges within a pseudo-node are horizontal while edges between pseudo-nodes are vertical.

As a natural extension of binary B-trees, Bayer invented Symmetric Binary Trees, or SBB- trees [16]. The idea was that, instead of only allowing a binary node to have one horizontal outgoing edge to its right child, we can allow both left- and right-edges to be horizontal. For both binary B-trees and Symmetric Binary B-trees, Bayer designed maintenance algorithms, where the original B-tree operations split, merge, and increase/decrease number of levels were implemented for the pseudo-nodes.

Today, SBB-trees mostly appear under the name red-black trees [34]. Here, the horizontal and vertical edges are represented by one “color” per node. (Both notations can be rep- resented by one bit per node.) SBB/red-black trees are binary implementations of B-trees where each node has degree between 2 and 4.

One advantage with SBB-trees/red-black trees is that a tree can be updated with only a constant number of rotations per insertion or deletion. This property is important for example when maintaining priority search trees [56] where each rotation requires Θ(log n) time.

The ﬁrst binary search tree with O(1) rotations per update was the half-balanced trees by Olivi´e [66]. Olivi´e’s idea was to use path-balancing, where the quotient between the shortest and longest path from each node is restricted to be at most 1/2, and he showed that this path-balance could be maintained with O(1) rotations per update. It turns out to be the case that half-balanced trees and SBB/red-black trees are structurally equivalent, although their maintenance algorithms are diﬀerent. It has also been proven by Tarjan [73] that SBB/red-black trees can be maintained by O(1) rotations. These algorithms can also be generalized to maintain pseudo-nodes of higher degree, resulting in binary B-tree implementations with lower height [8], still requiring O(1) rotations per update.

The mechanism behind the constant number of rotations per update can be explained in a simple way by examining three cases of what can happen during insertion and deletion in a binary B-tree representation.

• When a pseudo-node becomes too large, it can be split into two pseudo-nodes without any rotation; we just need to change the balance information.

• Also, when a pseudo-node becomes too small and its sibling has minimal size, these two nodes can be merged without any rotation; we just change balance information.

• In all other cases, when a pseudo-node becomes too small or too large, this will be resolved by moving nodes between the pseudo-node and its sibling and no splitting or merging will take place.

From these three basic facts, it can be shown that as soon as the third case above occurs, no more rebalancing will be done during the same update. Hence, the third case, requiring rotations, will only occur once per update. For details, we refer to the literature [8, 73].

Binary B-trees can also be used to design very simple maintenance algorithms that are easy to code. This is illustrated by AA-trees [5, 77]. AA-trees are actually the same as Bayer’s binary version of 2-3-trees, but with design focused on simplicity. Compared with normal red-black tree implementations, AA-trees require very few diﬀerent cases in the algorithm and much less code for implementation.

While binary B-trees and SBB/red-black trees deal with small pseudo-nodes, the stratiﬁed trees by van Leeuwen and Overmars [76] use large pseudo-nodes arranged in few layers. The concept of stratiﬁcation does not imply that all pseudo-nodes have similar size; it is mainly a way to conceptually divide the tree into layers, using the notion of merging and splitting.

Search This Blog

algorithms