Searching and Priority Queues in o(log n) Time:Sorting and Priority Queues
Sorting and Priority Queues
In the comparison-based model of computation, the cost per element is the same (O(log n)) for searching and sorting. However, in the RAM model of computation, the sorting problem can be solved faster than searching. The simple intuitive explanation of this is that the bit- parallelism in packed computation can be utilized more efficiently when a number of keys are treated simultaneously, as in sorting, than when they are treated one-by-one as in searching.
Even more, it turns out that priority queues can be as implemented as efficiently as sorting. (The intuitive reason for this is that a priority queue can use sorting as a subroutine, and only keep a small part of the queue perfectly sorted.) Thorup [24] has shown the following reduction:
THEOREM 39.6 If we can sort n keys in time S(n) per key, then we can implement a priority queue supporting find-min in constant time and updates (insert and delete) in S(n) time.
In the following, we will use T (n, b) to denote the cost of sorting n b-bit keys.
In sorting algorithms, range reduction is an often used technique. For example, we may view traditional radix sort, where we sort long strings by dividing them into shorter parts, as range reduction.
For our purposes, we will use a range reduction technique by Kirkpatrick and Reisch [19], which is similar to the van Emde Boas tree, cf.
The only difference from Section 39.4.1 is that instead of letting each trie node contain a data structure for efficient neighbour search among the outgoing edges, we just keep an unsorted list of all outgoing edges (plus the array for constant-time indexing of edges). Then, after all elements have been inserted into the trie, we create sorted lists of edges at all nodes by the following method:
1. Mark each edge with its parent node.
2. Concatenate all edge lists and sort the entire list.
3. Scan the sorted list and put each edge back in the proper node.
4. All edges lists are now sorted. By a recursive traversal of the trie we can report all leafs in sorted order.
Other details, such as the need to store one key per node to avoid space blow up, are handled in the same way as in Section 39.4.1. Altogether, we get the reduction
Packed Sorting
For the sorting problem, multiple comparisons can be utilized more efficiently than in a packed B-tree. In the packed B-tree, we used the bit-parallelism to compare one key to many keys, in this way we implemented a parallel version of a linear search, which is not the most efficient search method.
For sorting, however, we can utilize the packed computation more efficiently. It turns out that algorithms for sorting networks are well suited for implementation by packed computation. A sorting network is a “static” algorithm; it contains a number of compare- and-swap operations, these are the same regardless of the outcome of the comparisons. The merging technique by Batcher [8], originally used to design odd-even merge sort, can be efficiently implemented with packed computation. As a sorting network, Batcher’s merging algorithm has depth Θ(log n) where each level has O(n) compare-and-swap units. Based on the merging, we can sort in Θ(log2 n) time where the total work is Θ(n log2 n) Batcher’s merging technique is well suited for combination with the Paul-Simon technique, as shown by Albers and Hagerup [1].
LEMMA 39.8 T (n, w/ log n) ≤ O(n log log n).
Proof (Sketch) The key idea is that by packing Θ(log n) keys in a machine word, we can combine Batcher’s algorithm with packed computation to merge two sorted sequences in O(log log n) time. And, if we can merge two sequences of length Θ(log n) in O(log log n) time (instead of O(log n) by a comparison-based algorithm), we can use this as a subroutine tom implement a variant of merge sort that sorts n keys in O(n log log n) time (instead of O(n log n)).
First, the bound on searching from Theorem 39.1 has a corresponding theorem for sorting [19]:
THEOREM 39.7 T (n, w) = O(n log(w/ log n)).
Proof (Sketch) Apply Eq. 39.6 with k = log n. Keys of length log n can be sorted in linear time with bucket sort.
Secondly, we combine range reduction with packed computation. We get the following bound [4]:
THEOREM 39.8 T (n, w) = O(n log log n).
Proof (Sketch) If log w = O(log log n), Theorem 39.7 is sufficient. Otherwise, Eq. 39.6 with k = w/ log n gives T (n, w) = O(n log log n) + T (n, w/ log n). Lemma 39.8 gives the final bound.
Further Techniques and Faster Randomized Algorithms
Apart from these rather simple techniques, there are a number of more elaborate techniques that allows the complexity to be improved further. Examples of such techniques are signature sort [4] and packed bucketing [17]. Here, we give a short sketch of signature sorting.
Consider a situation where the word length w is very large, and we wish to reduce the problem of sorting w-bit keys to that of sorting k-bit keys, k » log n. Instead of treating these k-bit keys directly, we represent each such key by a b-bit signature, where the b bits are a hash function of the k bits. In fact, for one w-bit key, we can in constant time replace it by a shorter key, consisting of q b-bit signatures (for details, we refer to the literature [4, 17]).
1. Replace each w-bit key by a qb-bit key of concatenated signatures.
2. Sort the qb-bit keys.
3. Compute, for each qb-bit key, its first distinguishing signature. This can be done by constructing a signature-based trie of all keys.
4. If we know the first distinguishing signature in a qb-bit key, we know the first distinguishing k-bit field in the corresponding w-bit key. Finding these k-bit
fields, the range reduction is completed and we can continue by sorting these shorter keys.
It should be noted that the sorted set of qb-bit keys does not correspond to a sorted set of w-bit keys. However, the ordering we get is enough to find the proper distinguishing fields. Furthermore, since we use hash coding, we might get collisions, in which case the method will not work. By choosing b large enough, the risk of failure is small enough that we can afford to redo the entire sorting in case of failure: still the expected time for the range reduction step will be linear.
As an important recent result, Han and Thorup presents a linear algorithm for split- ting n integers into subsets, where each subset is of size O(log n). Combining this with techniques like signature sorting, they manage to improve the randomized complexity of sorting to O(n√log log n). This, in turn, implies that a priority queue can be implemented at O(√log log n) time per update and find-min in constant time.
Other relevant reading can be found in the cited articles, or in [6, 10, 14–16, 22, 23]
Comments
Post a Comment