Online Dictionary Structures:Balanced BST Implementation

Balanced BST Implementation

Let us now discuss the data structure and algorithms for the multidimensional dictionaries given in [5]. The representation is based on balanced binary search trees, without external nodes, i.e., each node represents one d-tuple. Balanced binary search trees and their algorithms ([15]) are like the typical “bottom-up” algorithms for the Red-Black trees. The ordering of the d-tuples is lexicographic. Each node t in the tree rooted at r has the following information in addition to the color bit required to manipulate balanced binary search trees [15]. Note that if two d-tuples are identical the index of the first component where they differ is set to d + 1. We use this convention through out this chapter.

image

The insert, delete and membership procedures perform the operations required to manipulate balanced binary search trees, and some new operations to update the structure. The basic operations to manipulate balanced binary search trees are well-known [13, 15]; so we only discuss in detail the new operations.

To show that membership, insert, and delete can be implemented to take O(d + log n) time, we only need to show that the following (new) operations can be performed O(d+log n) time.

A. Given the d-tuple q determine whether or not it is stored in the tree.

B. Update the structure after adding a node (just before rotation(s), if any).

C. Update the structure after performing a rotation.

D. Update the structure after deleting a leaf node (just before rotation(s), if any).

E. Transform the deletion problem to deletion of a leaf node.

The membership operation that tests whether or not the d-tuple q given by (q.x(1), q.x(2),..., q.x(d)) is in the multidimensional binary search tree (or subtree) rooted at r appears in [5] and implements (A). The basic steps are as follows. Let t be any node encountered in the search process in the multidimensional balanced binary search tree rooted at r. Let prev(t) to be the d-tuple in r with largest value but whose value is smaller than all the d-tuples stored in the subtree rooted at t, unless no such tuple exists in which case its value is (−∞, −∞,... , −∞), and let next(t) be the d-tuple in r with smallest value but whose value is larger than all the d-tuples stored in the subtree rooted by t, unless no such tuple exists in which case its value is (+∞, +∞,... , +). The following invariant is maintained throughout the search process. During the search we will be visiting node t which is initially the root of the tree. The value of dlow is the index of the first component where q and prev(t) differ, and variable dhigh is the index of the first component where q and next(t) differ. The d-tuple being search for, q, has value (lexicographically) greater than prev(t) and (lexicographically) smaller than next(t) The computation of j, the index of the first component where t.s and q differ is like the one in [11]. When dlow dhigh , then either (1) dlow j= t.lj in which case j is just min {dlow , t.lj}, or (2) dlow = t.lj in which case j is set to the index of the first component starting at position dlow where q and t.s differ. The case when dlow < dhigh is similar and appears in [5]. Gonzalez [5] proved that by setting j by the above procedure will set it to the index of the first component where t.s and q differ. When j is equal to d + 1, then q is the d-tuple stored in t and we return the value of true. Otherwise by comparing the jth element of q and t the procedure decides whether to search in the left or right subtrees of t. In either case dhigh or dlow is set appropriately and the invariant holds at the next iteration. The time complexity at each level is not bounded by a constant; however, it is bounded by 1 plus the difference between the new and old value of max{dlow , dhigh }. Since max{dlow , dhigh } does not decrease and it is at most d + 1 at the end of each operation, it follows that the total number of operations performed is of order d plus the height of the tree (O(log n)). Correctness and efficiency are established in the following lemma whose proof appears in [5].

LEMMA 24.1 [5] Given a d-tuple q procedure membership(q, r) given in [5] determines whether or not q is in the multidimensional balanced binary search tree rooted at r in O(d + log n) time.

For operation (B), a node is added as a leaf node, the information that needs to be set for that node are the pointers lptr and hptr which can be copied from the parent node, unless there is no parent in which case they are set to null. The values lj, and hj can be computed directly in O(d) time. A predecessor of the node added also needs its lptr and lj, or hptr and hj values changed. This can be easily done in O(d), by remembering the the node in question during the search process. i.e., the last node where one makes a left turn (moving to the leftchild) or the last one where one makes a right turn (moving to the rightchild).

LEMMA 24.2 [5] After inserting a node q in a multidimensional balanced binary search tree and just before rotation the structure can be updated as mentioned above in O(d+log n) time.

The implementation of operation (D) is similar to the one for (B), therefore it will not be discussed further. The rotation operation (C) can be reduced to the simple rotation, because double and triple rotations can reduced to two and three simple rotations, respectively. A simple rotation is shown in Figure 24.2. It is simpler to implement the rotation operation by moving the nodes rather than just their values, because this reduces the number of updates that need to be performed. Clearly, the only nodes whose information needs to be updated are b, d and the parent of d. This just takes O(d) time.

image

LEMMA 24.3 [5] After a rotation in a multidimensional balanced binary search tree the structure can be updated as mentioned above in O(d) time.

Operation (E) is more complex to implement. It is well known [8] that the problem of deleting an arbitrary node from a balanced binary search tree can be reduced to deleting a leaf node by applying a simple transformation. Since all cases are similar, lets just discuss the case shown in Figure 24.3.

The node to delete is a which is not a leaf node.

In this case node b contents are moved to node a, and now we delete the old node b which is labeled x. As pointed out in [5], updating the resulting structure takes O(d + log n) time.

Since the node labeled x will be deleted, we do not need to update it. For the new root (the one labeled b) we need to update the lj and hj values. Since we can use directly the old lptr and hptr pointers, the update can be done in O(d) time. The lj (hj) value of the nodes (if any) in the path that starts at the right (left) child of the new root that continues through the leftchild (rightchild) pointers until the null pointer is reached needs to be updated. There are at most O(log n) of such nodes. However, the d-tuples stored at each of these nodes are decreasing (increasing) in lexicographic order when traversing the path top down. Therefore, the lj (hj) values appear in increasing (decreasing) order. The correct values can be easily computed in O(d + log n) time by reusing previously computed lj (hj) values while traversing the path top down. The following lemma, whose proof is omitted, summarizes these observations. Then we state the main result in [5] which is based on the above discussions and the lemmas.

LEMMA 24.4 [5] Transforming the deletion problem to deleting a leaf node can be performed, as mentioned above, in O(d + log n) time.

THEOREM 24.1 [5] Any on-line sequence of operations of the form insert, delete and membership, for any d-tuple can be carried out by the procedure in [5] on a multidimensional balanced binary search tree in O(d + log n) time, where n is the current number of points, and each insert and delete operation requires no more than a constant number of rotations.

image

Comments

Popular posts from this blog

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Data Structure Visualization:Introduction and Value of Data Structure Rendering

0/1 Knapsack Problem Memory function.