Searching and Priority Queues in o(log n) Time:Deterministic Algorithms and Linear Space.

Deterministic Algorithms and Linear Space

The data structures in this section are more complicated than the previous ones. They also need more powerful—but standard—instructions, like multiplication. On the other hand, these structures achieves linear space without randomization (i.e. without hashing).

DEFINITION 39.3 Let D(n) be the worst-case search cost and the amortized update cost in an ordered dictionary storing n keys in O(n) space.

Fusion Trees

The fusion tree was the first data structure to surpass the logarithmic barrier for searching. The central part of the fusion tree [13] is a static data structure with the following properties:

LEMMA 39.3 For any d, d = O (w1/6), a static data structure containing d keys can be constructed in O (d4) time and space, such that it supports neighbour queries in O(1) worst-case time.

Proof (Sketch) The main idea behind the fusion tree is to view the keys as stored in an implicit binary trie and concentrate at the branching levels in this trie. We say that branching occurs at significant bit positions. We illustrate this view with an example, shown in Figure 39.2.

image

This extraction of bits is nontrivial; it can be done with multiplication and masking. However, the extraction is not as perfect as described here; in order to avoid problems with carry bits etc, we need to extract some more bits than just the significant ones. Here, we ignore these problems and assume that we can extract the desired bits properly. For details we refer to Fredman and Willard [13].

The d compressed keys may be used to determine the rank of a query key among the orig- inal d keys with packed computation. Assume that we search for x = 1010011001110100 , represented as the fat path in Figure 39.2. First, we extract the proper bits to form a

compressed key xl = 0010 . Then, we use packed searching to determine the rank of xl among yl ,... , yl .. In this case, the packed searching will place xl between yl and yl . as 1 d 2 3 indicated by the arrow in Figure 39.2. This is not the proper rank of the original key x, but nevertheless it is useful. The important information is obtained by finding the position of the first differing bit of x and one of the keys y2 and y3. In this example, the 7th bit is the first differing bit. and, since x has a 1 at this bit position, we can conclude that it is greater than all keys in Y with the same 6-bit prefix. Furthermore, the remaining bits in x are insignificant. Therefore, we can replace x by the key 1010011111111111 , where all the last bits are 1s. When compressed, this new key becomes 0111 . Making a second packed searching with this key instead, the proper rank will be found.

Hence, in constant time we can determine the rank of a query key among our d keys.

The original method by Fredman and Willard is slightly different. Instead of filling the query keys with 1s (or 0s) and making a second packed searching, they use a large lookup table in each node. Fusion trees can be implemented without multiplication, using only AC0 instructions, provided that some simple non-standard instructions are allowed [5].

THEOREM 39.3 D(n) = O(log n/ log log n).

Proof (Sketch) Based on Lemma 39.3, we use a B-tree where only the upper levels in the tree contain B-tree nodes, all having the same degree (within a constant factor). At the lower levels, traditional (i.e. comparison-based) weight-balanced trees are used. The reason for using weight-balanced trees is that the B-tree nodes are costly to reconstruct; the trees at the bottom ensure that few updates propagate to the upper levels. In this way, the amortized cost of updating a B-tree node is small.

The amortized cost of searches and updates is O(log n/ log d+log d) for any d = O (w1/6). The first term corresponds to the number of B-tree levels and the second term corresponds to the height of the weight-balanced trees. Since w log n (otherwise a pointer would not fit in a word), the cost becomes at most O(log n/ log log n).

Exponential Search Trees

The exponential search tree [3, 7] allows efficient dynamization of static dictionary structures. The key feature is:

Any static data structure for searching that can be constructed in polynomial time and space can be efficiently used in a dynamic data structure.

The basic data structure is a multiway tree where the degrees of the nodes decrease doubly-exponentially down the tree. In each node, we use a static data structure for navi- gation. The way the tree is maintained, we can guarantee that, before an update occurs at a certain node, a polynomial number of updates will be made below it. Hence, even if an update requires a costly reconstruction of a static data structure, this will occur with large enough intervals.

LEMMA 39.4 Suppose a static data structure containing d keys can be constructed in O (d4) time and space, such that it supports neighbour queries in O(S(d)) worst-case time.

Then,

image

Proof (Sketch) We use an exponential search tree. It has the following properties:

• Its root has degree Θ(n1/5).

• The keys of the root are stored in a local (static) data structure, with the properties stated above. During a search, the local data structure is used to determine in which subtree the search is to be continued.

• The subtrees are exponential search trees of size Θ(n4/5).

First, we show that, given n sorted keys, an exponential search tree can be constructed in linear time and space. The cost of constructing a node of degree d is O (d4), and the total construction cost C(n) is (essentially) given by

image

Furthermore, with a similar equation, the space required by the data structure can be shown to be O(n).

Balance is maintained by joining and splitting subtrees. The basic idea is the following: A join or split occurs when the size of a subtree has changed significantly, i.e. after Ω(n4/5) updates. Then, a constant number of subtrees will be reconstructed; according to Equa- tion 39.3, the cost of this is linear in the size of the subtrees = O(n4/5). Also, some keys will be inserted or deleted from the root, causing a reconstruction of the root; the cost of this is by definition O(n4/5). Amortizing these two costs over the Ω(n4/5) updates, we get O(1) amortized cost for reconstructing the root. Hence, the restructuring cost is dominated by the search cost.

Finally, the search cost follows immediately from the description of the exponential search tree.

Exponential search trees may be combined with various other data structures, as illustrated by the following two lemmas:

LEMMA 39.5 A static data structure containing d keys can be constructed in O (d4) time and space, such that it supports neighbour queries in image worst-case time.

Proof (Sketch) We just construct a static B-tree where each node has the largest possible degree according to Lemma 39.3. That is, it has a degree of min (d, w1/6). This tree satisfies the conditions of the lemma.

LEMMA 39.6 A static data structure containing d keys and supporting neighbour queries in O(log w) worst-case time can be constructed in O (d4) time and space.

Proof (Sketch) We study two cases.

Case 1: w > d1/3. Lemma 39.5 gives constant query cost.

Case 2: w d1/3. The basic idea is to combine a van Emde Boas tree (Theorem 39.1) with perfect hashing. The data structure of Theorem 39.1 uses much space, which can be reduced to O(d) by hash coding. Since we can afford a rather slow construction, we can use the deterministic algorithm by Fredman, Koml´os, and Szemer´edi [12]. With this algorithm, we can construct a perfectly hashed van Emde Boas tree in O(d3w) = o(d4) time.

Combining these two lemmas, we get a significantly improved upper bound on deterministic sorting and searching in linear space:

image

Taking both n and w as parameters, D(n) is o(log n) in many cases [3]. For example, it can be shown that D(n) = O(log w log log n).

The strongest possible bound is achieved by using the following result by Beame and Fich [9]

LEMMA 39.7 [Beame and Fich [9]] In polynomial time and space, we can construct a deterministic data structure over d keys supporting searches in image time.

Combining this with the exponential search tree we get, among others, the following theorem.

image

Since a matching lower bound was also given by Beame and Fich, this bound is optimal.

Comments

Popular posts from this blog

0/1 Knapsack Problem Memory function.

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Drawing Trees:HV-Layout