Balanced Binary Search Trees: Introduction and Basic Definitions

Introduction

Balanced binary search trees are among the most important data structures in Computer Science. This is because they are eﬃcient, versatile, and extensible in many ways. They are used as a black-box in numerous algorithms and even other data structures.

The main virtue of balanced binary search trees is their ability to maintain a dynamic set in sorted order, while supporting a large range of operations in time logarithmic in the size of the set. The operations include search, insertion, deletion, predecessor/successor search, range search, rank search, batch update, split, meld, and merge. These operations are described in more detail in Section 10.2 below.

Data structures supporting the operations search, insertion, deletion, and predecessor (and/or successor) search are often denoted ordered dictionaries. In the comparison based model, the logarithmic performance of balanced binary search trees is optimal for ordered dictionaries, whereas in the RAM model, faster operations are possible [13, 18]. If one considers unordered dictionaries, i.e., only the operations search, insertion, and deletion, expected constant time is possible by hashing.

Basic Deﬁnitions

Trees

There are many ways to deﬁne trees. In this section, we deﬁne a tree as a hierarchical organization of a collection of nodes. For alternatives to our exposition, see the chapter on trees.

A tree can be empty. If it is not empty, it consists of one node, which is referred to as the root of the tree, and a collection of trees, referred to as subtrees. Thus, a tree consists of many smaller trees, each with their own root. We use r to denote the single node which is the root of the entire tree.

We only consider ﬁnite trees, i.e., every collection of subtrees is ﬁnite, and there are no inﬁnite chains of nonempty subtrees. Furthermore, we only consider ordered trees, meaning that the collection of subtrees of a node is an ordered sequence rather than just a set. If every nonempty tree has exactly two subtrees, then the tree is called binary. In this case, we refer to the two subtrees as the left and right subtrees.

We use u, v, w, etc. to denote nodes and T to denote trees, applying apostrophes, index, etc. to increase the name space. For a node u, we use u.l and u.r to denote the left and right subtree, respectively, of the tree rooted by u. However, when no confusion can occur, we do not necessarily distinguish between nodes and subtrees. Thus, by the subtree v, we mean the subtree rooted at the node v and by T we mean the entire tree or the root of the tree.

We use the standard genealogical terminology to denote nodes in the vicinity of a desig- nated node. Thus, if u is the root of a tree and v is the root of a subtree of u, then v is referred to as a child of u. By analogy, this deﬁnes grandchildren, parent, grandparent, and sibling.

The set of nodes belonging to a nonempty tree is its root, along with all the nodes belonging to its subtrees. For an empty tree, this set is of course empty. If a node v belongs to the subtree of u, then v is a descendant of u, and u is an ancestor of v. An ancestor or descendant v of a node u is proper if u j= v.

Quite often, it is convenient to refer to empty subtrees as real nodes, in which case they are referred to as external nodes (or leaves). The remaining nodes are then referred to as internal nodes. It is easy to prove by induction that the number of external nodes is always one larger than the number of internal nodes.

The number of nodes belonging to a tree is referred to as its size (or its weight). In some applications, we deﬁne the size of the tree to be the number of internal nodes in the tree, but more often it is convenient to deﬁne the size of the tree to be the number of external nodes. We use n to denote the size of the tree rooted by r, and |u| to denote the size of the subtree rooted by u.

A path in a tree is a sequence of nodes u1, u2,... , uk , k ≥ 1, such that for i ∈ {1,... ,k−1}, ui+1 is a child of ui. Note that the length of such a path is k − 1. The depth of a node u in the tree T is the length of the path from the root of T to u, and the height of a tree T is the maximal depth of any external node.

Binary Trees as Dictionaries

When trees are used to implement the abstract data type dictionary, nodes have associated values. A dictionary basically organizes a set of keys, which must be elements drawn from a total ordering, and must usually supply at least the operations search, insertion, and deletion. There may be additional information associated with each key, but this does not lead to any conceptual complications, so here we simply focus on the keys.

When a tree is used as a dictionary, each node stores one key, and we impose the following ordering invariant (the in-order invariant): for each node u in the tree, every key in u.l is strictly smaller than u.k, and every key in u.r is strictly larger than u.k. A tree organized according to this invariant is referred to as a binary search tree.

An important implication of this ordering invariant is that a sorted list of all the keys in the tree can be produced in linear time using an in-order traversal deﬁned recursively as follows. On an empty tree, do nothing. Otherwise, recurs on the left subtree, report the root key, and then recurs on the right subtree.

Many diﬀerent operations can be supported by binary search tree implementations. Here, we discuss the most common. Using the ordering invariant, we can devise a searching procedure of asymptotic time complexity proportional to the height of the tree. Since searching turns out to be at the heart of most of the operations of interest to us, unless we stipulate otherwise, all the operations in the following inherit the same complexity.

Simple Searching

To search for x in a tree rooted by u, we ﬁrst compare x to u.k. If they are equal, a positive response is given. Otherwise, if x is smaller than u.k, we search recursively in u.l, and if x is larger, we search in u.r. If we arrive at an empty tree, a negative response is given. In this description, we have used ternary comparisons, in that our decisions regarding how to proceed depend on whether the search key is less than, equal to, or greater than the root key. For implementation purposes, it is possible to use the more eﬃcient binary comparisons [12].

A characteristic feature of search trees is that when a searching fails, a nearest neigh- bor can be provided eﬃciently. Dictionaries supporting predecessor/successor queries are referred to as ordered. This is in contrast to hashing (described in a chapter of their own) which represents a class of unordered dictionaries. A predecessor search for x must return the largest key less than or equal to x. This operation as well as the similar successor search are simple generalizations of the search strategy outlined above. The case where x is found on the way is simple, so assume that x is not in the tree. Then the crucial observation is that if the last node encountered during the search is smaller than x, then this node is the predecessor. Otherwise, the predecessor key is the largest key in the left subtree of the last node on the search path containing a key smaller than x. A successor search is similar.

Simple Updates

An insertion takes a tree T and a key x not belonging to T as arguments and adds a node containing x and two empty subtrees to T . The node replaces the empty subtree in T where the search for x terminates.

A deletion takes a tree T and a key x belonging to T as arguments and removes the node u containing x from the tree. If u’s children are empty trees, u is simply replaced by an empty tree. If u has exactly one child which is an internal node, then this child is replacing

u. Finally, if u has two internal nodes as children, u’s predecessor node v is used. First, the key in u is overwritten by the key of v, after which v is deleted. Note that because of the choice of v, the ordering invariant is not violated. Note also that v has at most one child which is an internal node, so one of the simpler replacing strategies described above can be used to remove v.

More Searching Procedures

A range search takes a tree T and two key values k1 ≤ k2 as arguments and returns all keys x for which k1 ≤ x ≤ k2. A range search can be viewed as an in-order traversal, where we do not recurs down the left subtree and do not report the root key if k1 should be in the right subtree; similarly, we do not recurs down the right subtree and do not report the root key if k2 should be in the left subtree. The complexity is proportional to the height of the tree plus the size of the output.

A useful technique for providing more complex operations eﬃciently is to equip the nodes in the tree with additional information which can be exploited in more advanced searching, and which can also be maintained eﬃciently. A rank search takes a tree T and an integer d between one and n as arguments, and returns the dth smallest key in T . In order to provide this functionality eﬃciently, we store in each node the size of the subtree in which it is the root. Using this information during a search down the tree, we can at each node determine in which subtree the node must be located and we can appropriately adjust the rank that we search for recursively. If the only modiﬁcations made to the tree are small local changes, this extra information can be kept up-to-date eﬃciently, since it can always be recomputed from the information in the children.

Operations Involving More Trees

The operation split takes a key value x and tree T as arguments and returns two trees; one containing all keys from T less than or equal to x and one with the remaining keys. The operations is destructive, meaning that the argument tree T will not be available after the operation. The operation meld takes two trees as arguments, where all keys in one tree are smaller than all keys in the other, and combines the trees into one containing all the keys. This operation is also destructive. Finally, merge combines the keys from two argument trees, with no restrictions on keys, into one. Also this operation is destructive.

Implementation of Binary Search Trees

In our discussion of time and space complexities, we assume that some standard implementation of trees are used. Thus, in analogy with the recursive deﬁnition, we assume that a tree is represented by information associated with its root, primarily the key, along with pointers (references) to its left and right subtrees, and that this information can be accessed in constant time.

In some situations, we may assume that additional pointers are present, such as parent-pointers, giving a reference from a node to its parent. We also sometimes use level-pointers.

A level consists of all nodes of the same depth, and a level-pointer to the right from a node with key k points to the node at the same level with the smallest key larger than k. Similar for level-pointers to the left.

Search This Blog

algorithms