Randomized Dictionary Structures:Randomized Binary Search Trees.
Randomized Binary Search Trees
A Binary Search Tree (BST) for a set S of keys is a binary tree satisfying the following properties.
(a) Each node has exactly one key of S. We use the notation v.key to denote the key stored at node v.
(b) For all node v and for all nodes u in the left subtree of v and for all nodes w in the right subtree of v, the keys stored in them satisfy the so called search tree property:
The complexity of the dictionary operations are bounded by the height of the binary search tree. Hence, ways and means were explored for controlling the height of the tree during the course of execution. Several clever balancing schemes were discovered with varying degrees of complexities. In general, the implementation of balancing schemes are tedious and we may have to perform a number of rotations which are complicated operations. Another line of research explored the potential of possible randomness in the input data. The idea was to completely avoid balancing schemes and hope to have ‘short ’ trees due to randomness in input. When only random insertion are done, we obtain so called Randomly Built Binary Tree (RBBT). RBBTs have been shown to have O(log n) expected height.
What is the meaning of random insertion? Suppose we have already inserted the values a1, a2, a3, ··· , ak−1. These values, when considered in sorted order, define k intervals on the real line and the new value to be inserted, say x, is equally likely to be in any of the k intervals.
The first drawback of RBBT is that this assumption may not be valid in practice and when this assumption is not satisfied, the resulting tree structure could be highly skewed and the complexity of search as well as insertion could be as high as O(n). The second major drawback is when deletion is also done on these structures, there is a tremendous degradation in the performance. There is no theoretical results available and extensive empirical studies show that the height of an RBBT could grow to O(√n) when we have arbitrary mix of insertion and deletion, even if the randomness assumption is satisfied for inserting elements. Thus, we did not have a satisfactory solution for nearly three decades.
In short, the randomness was not preserved by the deletion operation and randomness preserving binary tree structures for the dictionary operations was one of the outstanding open problems, until an elegant affirmative answer is provided by Martinez and Roura in their landmark paper [3].
In this section, we shall briefly discuss about structure proposed by Martinez and Roura.
DEFINITION 13.15 [Randomized Binary Search Trees] Let T be a binary search tree of size n. If n = 0, then T = NU LL and it is a random binary search tree. If n > 0, T is a random binary search tree iff both its left subtree L and right subtree R are independent random binary search trees and
The above definition implies that every key has the same probability of 1 for becoming the root of the tree. It is easy to prove that the expected height of a RBST with n nodes is O(log n). The RBSTs possess a number of interesting structural properties and the classic book by Mahmoud [2] discusses them in detail. In view of the above fact, it is enough if we prove that when insert or delete operation is performed on a RBST, the resulting tree is also an RBST.
Insertion in RBST
When a key x is inserted in a tree T of size n, we obtain a tree T I of size n + 1. For T I, as we observed earlier, x should be in the root with probability 1 . This is our starting point.
To insert x as a root of the resulting tree, we first split the current tree into two trees labeled T< and T>, where T< contains all the keys in T that are smaller than x and T> contains all the keys in T that are larger than x. The output tree T I is constructed by placing x at the root and attaching T< as its left subtree and T> as its right subtree. The algorithm for splitting a tree T into T< and T> with respect to a value x is similar to the partitioning algorithm done in quicksort. Specifically, the algorithm split(x,T) works as follows. If T is empty , nothing needs to be done; both T< and T> are empty. When T is non-empty we compare x with Root(T ).key. If x < Root(T ).key, then root of T as well as the right subtree of T belong to T>. To compute T< and the remaining part of T> we recursively call split(x, L), where L is the left subtree for the root of T . If x > Root(T ).key, T< is built first and recursion proceeds with split(x, R). The details are left as easy exercise.
We shall first prove that T< and T> are independent Random Binary Search Trees. Formally,
THEOREM 13.15 Let T< and T> be the BSTs produced by split(x, T ). If T is a random BST containing the set of keys S, then T< and T> are RBBTs containing the keys S< = {y ∈ S | y < x} and S> = {y ∈ S | y > x}, respectively.
Proof Let size(T ) = n > 0, x > Root(T ).key, we will show that for any z ∈ S<, the probability that z is the root of T< is 1/m where m = size(T<). In fact,
The independence of T< and T> follows from the independence of L and R of T and by induction.
We are now ready to prove that randomness is preserved by insertion. Specifically,
THEOREM 13.16 Let T be a RBST for the set of keys S and x ∈/ S, and assume that insert(s, T ) produces a tree, say T I for S ∪ {x}. Then, T I is a RBST.
Proof A key y ∈ S will be at the root of T I iff
1) y is at root of T .
2) x is not inserted at the root of T
Deletion in RBST
Suppose x ∈ T and let Tx denote the subtree of T rooted at x. Assume that L and R are the left and right subtrees of Tx. To delete x, we build a new BST T I = Join(L, R) containing the keys in L and R and replace Tx by T I . Since pseudocode for deletion is easy to write, we omit the details.
We shall take a closer look at the details of the Join routine. We need couple of more notations to describe the procedure Join. Let Ll and Lr denote the left and right subtrees of L and Rl and Rr denote the left and right subtrees of R, respectively. Let a denote the root of L and b denote the root of R. We select either a or b to serve as the root of T I with appropriate probabilities. Specifically, the probability for a becoming the root of T I is m and for b it is n where m = size(L) and n = size(R).
If a is chosen as the root, the its left subtree Ll is not modified while its right subtree Lr is replaced with Join(Lr , R). If b is chosen as the root, then its right subtree Rr is left intact but its left subtree Rl is replaced with Join(L, Rl). The join of an empty tree with another tree T is T it self and this is the condition used to terminate the recursion.
It remains to show that Join of two RBSTs produces RBST and deletion preserves ran- domness in RBST.
THEOREM 13.17 The Algorithm Join(L,R) produces a RBST under the conditions stated in the algorithm.
Proof We show that any key has the probability of 1/(n + m) for becoming the root of the tree output by Join(L,R). Let x be a key in L. Note that x will be at the root of Join(L,R) iff
• x was at the root of L before the Join operation, and,
• The root of L is selected to serve as the root of the output tree during the Join operation.
The probability for the first event is 1/m and the second event occurs with probability m/(n + m). As these events are independent, it follows that the probability of x at the root of the output tree is . A similar reasoning holds good for the keys in R.
Finally,
THEOREM 13.18 If T is a RBST for a set K of keys, then Delete(x,T) outputs a RBST for K − {x}.
Proof We sketch only the chance counting argument. The independence of the subtrees follows from the properties of Join operation and induction. Let T I = Delete(x, T ). Assume that x ∈ K and size of K is n. We have to prove that for any y ∈ K, y /= x, the probability for y in root of T I is 1/(n − 1). Now, y will be at the root of T I iff either x was at the root of T and its deletion from T brought y to the root of T I or y was at the root of T (so that deletion of x from T did not dislocate y). The former happens with a probability and the probability of the later is n . As these events are independent, we add the probabilities to obtain the desired result.
Comments
Post a Comment