Cache-Oblivious Data Structures:Priority Queues

Priority Queues

A priority queue maintains a set of elements with a priority (or key) each under the operations Insert and Delete Min, where an Insert operation inserts a new element in the queue, and a DeleteMin operation ﬁnds and deletes the element with the minimum key in the queue. Frequently we also consider a Delete operation, which deletes an element with a given key from the priority queue. This operation can easily be supported using Insert and DeleteMin: To perform a Delete we insert a special delete-element in the queue with the relevant key, such that we can detect if an element returned by a DeleteMin has really been deleted by performing another DeleteMin.

A balanced search tree can be used to implement a priority queue. Thus the existence of a dynamic cache-oblivious B-tree immediately implies the existence of a cache-oblivious priority queue where all operations can be performed in O(logB N ) memory transfers, where N is the total number of elements inserted. However, it turns out that one can design a priority queue where all operations can be performed in Θ(SortM,B (N )/N ) = O( 1 logM/B N ) memory transfers; for most realistic values of N , M , and B, this bound is less than 1 and we can, therefore, only obtain it in an amortized sense. In this section we describe two diﬀerent structures that obtain these bounds [5, 16].

Merge Based Priority Queue: Funnel Heap

The cache-oblivious priority queue Funnel Heap due to Brodal and Fagerberg [16] is inspired by the sorting algorithm Funnelsort [15, 20]. The structure only uses binary merging; es- sentially it is a heap-ordered binary tree with mergers in the nodes and buﬀers on the edges.

Structure

The main part of the Funnel Heap structure is a sequence of k-mergers (Section 34.2.2) with double-exponentially increasing k, linked together in a list using binary mergers; refer to Figure 34.10. This part of the structure constitutes a single binary merge tree. Additionally, there is a single insertion buﬀer I.

More precisely, let ki and si be values deﬁned inductively by

Ki, and Si1,..., Sik as the lower part of the link. The size of both Ai and Bi is k3, and the size of each Sij is si. Link i has an associated counter ci for which 1 ≤ ci ≤ ki + 1. The initial value of ci is one for all i. The structure also has one insertion buﬀer I of size s1. We maintain the following invariants:

Invariant 1 For link i, Sici ,..., Siki are empty.

Invariant 2 On any path in the merge tree from some buﬀer to the root buﬀer A1, elements appear in decreasing order.

Invariant 3 Elements in buﬀer I appear in sorted order.

Invariant 2 can be rephrased as the entire merge tree being in heap order. It implies that in all buﬀers in the merge tree, the elements appear in sorted order, and that the minimum element in the queue will be in A1 or I, if buﬀer A1 is non-empty. Note that an invocation (Figure 34.7) of any binary merger in the tree maintains the invariants.

Layout

The Funnel Heap is laid out in consecutive memory locations in the order I, link 1, link 2, ..., with link i being laid out in the order ci, Ai, vi, Bi, Ki, Si1, Si2, ... , Siki .

Operations

To perform a DeleteMin operation we compare the smallest element in I with the smallest element in A1 and remove the smallest of these; if A1 is empty we ﬁrst perform an invocation of v1. The correctness of this procedure follows immediately from Invariant 2.

To perform an Insert operation we insert the new element among the (constant number of) elements in I, maintaining Invariant 3. If the number of elements in I is now s1, we examine the links in order to ﬁnd the lowest index i for which ci ≤ ki. Then we perform the following Sweep(i) operation.

In Sweep(i), we ﬁrst traverse the path p from A1 to Sici and record how many elements are contained in each encountered buﬀer. Then we traverse the part of p going from Ai to Sici , remove the elements in the encountered buﬀers, and form a sorted stream σ1 of the removed elements. Next we form another sorted stream σ2 of all elements in links 1,... , i−1 and in buﬀer I; we do so by marking Ai temporarily as exhausted and calling DeleteMin repeatedly. We then merge σ1 and σ2 into a single stream σ, and traverse p again while inserting the front (smallest) elements of σ in the buﬀers on p such that they contain the same numbers of elements as before we emptied them. Finally, we insert the remaining elements from σ into Sici , reset cl to one for l = 1, 2,... ,i − 1, and increment ci.

To see that Sweep(i) does not insert more than the allowed si elements into Sici , ﬁrst note that the lower part of link i is emptied each time ci is reset to one. This implies that the lower part of link i never contains more than the number of elements inserted into Si1, Si2,..., Siki by the at most ki Sweep(i) operations occurring since last time ci was reset. Since si = s1 + i−1 kj sj for all i, it follows by induction on time that no instance of Sweep(i) inserts more than si elements into Sici .

Clearly, Sweep(i) maintains Invariants 1 and 3, since I and the lower parts of links 1,... ,i − 1 are empty afterwards. Invariant 2 is also maintained, since the new elements in the buﬀers on p are the smallest elements in σ, distributed such that each buﬀer contains exactly the same number of elements as before the Sweep(i) operation. After the operation, an element on this path can only be smaller than the element occupying the same location before the operation, and therefore the merge tree is in heap order.

Analysis

To analyze the amortized cost of an Insert or DeleteMin operation, we ﬁrst consider the number of memory transfers used to move elements upwards (towards A1) by invocations of binary mergers in the merge tree. For now we assume that all invocations result in full buﬀers, i.e., that no exhaustions occur. We imagine charging the cost of ﬁlling a particular buﬀer evenly to the elements being brought into the buﬀer, and will show that this way an element from an input buﬀer of Ki is charged O( 1 logM/B si) memory transfers during its ascent to A1.

Our proof rely on the optimal replacement strategy keeping as many as possible of the ﬁrst links of the Funnel Heap in fast memory at all times. To analyze the number of links that ﬁt in fast memory, we deﬁne ∆i to be the sum of the space used by links 1 to i and deﬁne iM to be the largest i for which ∆i ≤ M . By the space bound for k-mergers in Theorem 34.2 we see that the space used by link i is dominated by the Θ(siki) = Θ(ki4) space use of Si1,..., Sik . Since ki+1 = Θ(k4/3), the space used by link i grows double- exponentially with i. Hence, ∆i is a sum of double-exponentially increasing terms and is therefore dominated by its last term. In other words, ∆i = Θ(ki4) = Θ(si4/3). By the deﬁnition of iM we have ∆i ≤ M < ∆i +1. Using si+1 = Θ(s4/3) we see that logM (siM ) = Θ(1).

We imagine maintaining the credit invariant that each element in a buﬀer holds enough credits to be able to pay for the ascent from its current position to A1, at the cost analyzed above. In particular, an element needs O( 1 logM/B si) credits when it is inserted in an input buﬀer of Ki. The cost of these credits we will attribute to the Sweep(i) operation inserting it, eﬀectively making all invocations of mergers be prepaid by Sweep(i) operations.A Sweep(i) operation also incurs memory transfers by itself; we now bound these. In the Sweep(i) operation we ﬁrst form σ1 by traversing the path p from A1 to Sici . Since the links are laid out sequentially in memory, this traversal at most constitutes a linear scan of the consecutive memory locations containing A1 through Ki. Such a scan takes O((∆i−1 + |Ai| + |Bi| + |Ki|)/B) = O(ki /B) = O(si/B) memory transfers. Next we form σ2 using DeleteMin operations; the cost of which is paid for by the credits placed on the elements. Finally, we merge of σ1 and σ2 into σ, and place some of the elements in buﬀers on p and some of the elements in Sici . The number of memory transfers needed for this is bounded by the O(si /B) memory transfers needed to traverse p and Sici . Hence, the memory transfers incurred by the Sweep(i) operation itself is O(si /B).

After the Sweep(i) operation, the credit invariant must be reestablished. Each of the O(si) elements inserted into Sic must receive O( 1 logM/B si) credits. Additionally, the elements inserted into the part of the path p from A1 through Ai−1 must receive enough credits to cover their ascent to A1, since the credits that resided with elements in the same positions before the operations were used when forming σ2 by DeleteMin operations. This constitutes O(∆i−1) = o(si) elements, which by the analysis above, must receive O( 1 logM/B si) credits each. Altogether O(si/B) + O( si logM/B si) = O( si logM/B si) memory transfers are attributed to a Sweep(i) operation, again under the assumption that no exhaustions occur during invocations.

To actually account for exhaustions, that is, the memory transfers incurred when ﬁlling buﬀers that become exhausted, we note that ﬁlling a buﬀer partly incurs at most the same number of memory transfers as ﬁlling it entirely. This number was analyzed above to be

O(|Ai |/B) for Ai and O( |Bi | log M/B si) for Bi, when i> iM . If Bi become exhausted, onlya Sweep(i) can remove that status. If Ai become exhausted, only a Sweep(j) for j ≥ I can remove that status. As at most a single Sweep(j) with j > i can take place between one Sweep(i) and the next, Bi can only become exhausted once for each Sweep(i), and Ai can only become exhausted twice for each Sweep(i). From |Ai| = |Bi| = ki3 = Θ(si) it follows that charging Sweep(i) an additional cost of O( si logM/B si) memory transfers will cover all costs of ﬁlling buﬀers when exhaustion occurs.

Overall we have shown that we can account for all memory transfers if we attribute O( si logM/B si) memory transfers to each Sweep(i). By induction on i, we can show that at least si insertions have to take place between each Sweep(i). Thus, if we charge the Sweep(i) cost to the last si insertions preceding the Sweep(i), each insertion is charged O( 1 logM/B si) memory transfers. Given a sequence of operation on an initial empty priority queue, let imax be the largest i for which Sweep(i) takes place. We have simax ≤ N , where N is the number of insertions in the sequence. An insertion can be charged by at most one Sweep(i) for i = 1,... , imax, so by the double-exponential growth of si, the number of memory transfers charged to an insertion is

where the last equality follows from the tall cache assumption M = Ω(B2).

Finally, we bound the space use of the entire structure. To ensure a space usage linear in N , we create a link i when it is ﬁrst used, i.e., when the ﬁrst Sweep(i) occurs. At that point in time, ci, Ai, vi, Bi, Ki, and Si1 are created. These take up Θ(si) space combined. At each subsequent Sweep(i) operation, we create the next input buﬀer Sici of size si. As noted above, each Sweep(i) is preceded by at least si insertions, from which an O(N ) space bound follows. To ensure that the entire structure is laid out in consecutive memory locations, the structure is moved to a larger memory area when it has grown by a constant factor. When allocated, the size of the new memory area is chosen such that it will hold the input buﬀers Sij that will be created before the next move. The amortized cost of this is O(1/B) per insertion.

THEOREM 34.6 Using Θ(M ) fast memory, a sequence of N Insert, DeleteMin, and Delete operations can be performed on an initially empty Funnel Heap using O(N ) space in O( 1 logM/B N ) amortized memory transfers each.

Brodal and Fagerberg [16] gave a reﬁned analysis for a variant of the Funnel Heap that shows that the structure adapts to diﬀerent usage proﬁles. More precisely, they showed that the ith insertion uses amortized O( 1 logM/B Ni ) memory transfers, where Ni can be deﬁned in any of the following three ways: (a) Ni is the number of elements present in the priority queue when the ith insertion is performed, (b) if the ith inserted element is removed by a DeleteMin operation prior to the jth insertion then Ni = j − i, or (c) Ni is the maximum rank of the ith inserted element during its lifetime in the priority queue, where rank denotes the number of smaller elements in the queue.

Exponential Level Based Priority Queue

While the Funnel Heap is inspired by Mergesort and uses k-mergers as the basic build- ing block, the exponential level priority queue of Arge et al. [5] is somewhat inspired by distribution sorting and uses sorting as a basic building block.

Structure

The structure consists of Θ(log log N ) levels whose sizes vary from N to some small size c below a constant threshold ct; the size of a level corresponds (asymptotically) to the number of elements that can be stored within it. The i’th level from above has size N (2/3) and for convenience we refer to the levels by their size. Thus the levels from largest to smallest are level N , level N 2/3, level N 4/9, ..., level X9/4, level X3/2, level X, level X2/3, level X 4/9, . . . , level c9/4, level c3/2, and level c. In general, a level can contain any number of elements less than or equal to its size, except level N , which always contains Θ(N ) elements. Intuitively, smaller levels store elements with smaller keys or elements that were more recently inserted. In particular, the minimum key element and the most recently inserted element are always in the smallest (lowest) level c. Both insertions and deletions are initially performed on the smallest level and may propagate up through the levels.

Elements are stored in a level in a number of buﬀers, which are also used to transfer elements between levels. Level X consists of one up buﬀer uX that can store up to X elements, and at most X 1/3 down buﬀers dX ,..., dX each containing between 1 X 2/3 1 X1/3 2 and 2X 2/3 elements. Thus level X can store up to 3X elements. We refer to the maximum possible number of elements that can be stored in a buﬀer as the size of the buﬀer. Refer to Figure 34.11. Note that the size of a down buﬀer at one level matches the size (up to a constant factor) of the up buﬀer one level down.

We maintain three invariants about the relationships between the elements in buﬀers of various levels:

The three invariants ensure that the keys of the elements in the down buﬀers get larger as we go from smaller to larger levels of the structure. Furthermore, an order exists between the buﬀers on one level: keys of elements in the up buﬀer are larger than keys of elements in down buﬀers. Therefore, down buﬀers are drawn below up buﬀers on Figure 34.11. However, the keys of the elements in an up buﬀer are unordered relative to the keys of the elements in down buﬀers one level up. Intuitively, up buﬀers store elements that are “on their way up”, that is, they have yet to be resolved as belonging to a particular down buﬀer in the next (or higher) level. Analogously, down buﬀers store elements that are “on their way down”— these elements are by the down buﬀers partitioned into several clusters so that we can quickly ﬁnd the cluster of smallest key elements of size roughly equal to the next level down. In particular, the element with overall smallest key is in the ﬁrst down buﬀer at level c.

Layout

The priority queue is laid out in memory such that the levels are stored consecutively from smallest to largest with each level occupying a single region of memory. For level X we reserve space for exactly 3X elements: X for the up buﬀer and 2X 2/3 for each possible down buﬀer. The up buﬀer is stored ﬁrst, followed by the down buﬀers stored in an arbitrary order but linked together to form an ordered linked list. Thus O(log3/2 logc N N (2/3)i ) = O(N ) is an upper bound on the total memory used by the priority queue.

Operations

To implement the priority queue operations we use two general operations, push and pull. Push inserts X elements into level X3/2, and pull removes the X elements with smallest keys from level X3/2 and returns them in sorted order. An Insert or a DeleteMin is performed simply by performing a push or pull on the smallest level c.

Push. To push X elements into level X 3/2, we ﬁrst sort the X elements cache-obliviously using O(1 + X logM/B X ) memory transfers. Next we distribute the elements in the sorted list into the down buﬀers of level X3/2 by scanning through the list and simultaneously visiting the down buﬀers in (linked) order. More precisely, we append elements to the end of the current down buﬀer dX , and advance to the next down buﬀer dX as soon as we encounter an element with larger key than the pivot of dX . Elements with larger keys than the pivot of the last down buﬀer are inserted in the up buﬀer uX. Scanning through the X elements take O(1 + X ) memory transfers. Even though we do not scan through every down buﬀer, we might perform at least one memory transfer for each of the X 1/2 possible buﬀers. Thus the total cost of distributing the X elements is O( X + X1/2) memory transfers.

During the distribution of elements a down buﬀer may run full, that is, contain 2X elements. In this case, we split the buﬀer into two down buﬀers each containing X elements using O(1 + X ) transfers. We place the new buﬀer in any free down buﬀer location for the level and update the linked list accordingly. If the level already has the maximum number X 1/2 of down buﬀers, we remove the last down buﬀer dX by inserting its no more than 2X elements into the up buﬀer using O(1 + X ) memory transfers. Since X elements must have been inserted since the last time the buﬀer split, the amortized splitting cost per element is O( 1 + 1 ) transfers. In total, the amortized number of memory transfers used on splitting buﬀers while distributing the X elements is O(1 + X ).

If the up buﬀer runs full during the above process, that is, contains more than X3/2 elements, we recursively push all of these elements into the next level up. Note that after such a recursive push, X 3/2 elements have to be inserted (pushed) into the up buﬀer of level X3/2 before another recursive push is needed.

Overall we can perform a push of X elements from level X into level X3/2 in O(X 1/2 + B logM/B B ) memory transfers amortized, not counting the cost of any recursive push operations; it is easy to see that a push maintains all three invariants.

In the case where the down buﬀers contain fewer than 3 X elements, we ﬁrst pull the X3/2 elements with smallest keys from the next level up. Because these elements do not necessarily have smaller keys than the, say U , elements in the up buﬀer uX , we then sort this up buﬀer and merge the two sorted lists. Then we insert the U elements with largest keys into the up buﬀer, and distribute the remaining between X3/2 and X 3/2 + 3 X elements into X1/2 down buﬀers containing between X and X + 3 X 1/2 each (such that the O( 1 + 1 ) amortized down buﬀer split bound is maintained). It is easy to see that this maintains the three invariants. Afterwards, we can ﬁnd the X minimal key elements as above. Note that after a recursive pull, X 3/2 elements have to be deleted (pulled) from the down buﬀers of level X 3/2 before another recursive pull is needed. Note also that a pull on level X3/2 does not aﬀect the number of elements in the up buﬀer uX . Since we distribute elements into the down and up buﬀers after a recursive pull using one sort and one scan of X3/2 elements, the cost of doing so is dominated by the cost of the recursive pull operation itself. Thus ignoring the cost of recursive pulls, we have shown that a pull of X elements from level X 3/2 down to level X can be performed in O(1 + X logM/B X ) memory transfers amortized, while maintaining Invariants 4–6.

Analysis

To analyze the amortized cost of an Insert or DeleteMin operation, we consider the total number of memory transfers used to perform push and pull operations during N operations; to ensure that the structure always consists of O(log log N ) levels and use O(N ) space we rebuild it using O( N logM/B N ) memory transfers (or O( 1 logM/B N ) transfers per operation) after every N operations

Search This Blog

algorithms

Cache-Oblivious Data Structures:Priority Queues

Priority Queues

Comments

Post a Comment

Popular posts from this blog

Warshall’s Algorithm -to find TRANSITIVE CLOSURE

0/1 Knapsack Problem Memory function.

Collision Detection:Penetration Depth Computation