Cache-Oblivious Data Structures:Priority Queues
Priority Queues
A priority queue maintains a set of elements with a priority (or key) each under the operations Insert and Delete Min, where an Insert operation inserts a new element in the queue, and a DeleteMin operation finds and deletes the element with the minimum key in the queue. Frequently we also consider a Delete operation, which deletes an element with a given key from the priority queue. This operation can easily be supported using Insert and DeleteMin: To perform a Delete we insert a special delete-element in the queue with the relevant key, such that we can detect if an element returned by a DeleteMin has really been deleted by performing another DeleteMin.
A balanced search tree can be used to implement a priority queue. Thus the existence of a dynamic cache-oblivious B-tree immediately implies the existence of a cache-oblivious priority queue where all operations can be performed in O(logB N ) memory transfers, where N is the total number of elements inserted. However, it turns out that one can design a priority queue where all operations can be performed in Θ(SortM,B (N )/N ) = O( 1 logM/B N ) memory transfers; for most realistic values of N , M , and B, this bound is less than 1 and we can, therefore, only obtain it in an amortized sense. In this section we describe two different structures that obtain these bounds [5, 16].
Merge Based Priority Queue: Funnel Heap
The cache-oblivious priority queue Funnel Heap due to Brodal and Fagerberg [16] is inspired by the sorting algorithm Funnelsort [15, 20]. The structure only uses binary merging; es- sentially it is a heap-ordered binary tree with mergers in the nodes and buffers on the edges.
Structure
The main part of the Funnel Heap structure is a sequence of k-mergers (Section 34.2.2) with double-exponentially increasing k, linked together in a list using binary mergers; refer to Figure 34.10. This part of the structure constitutes a single binary merge tree. Additionally, there is a single insertion buffer I.
More precisely, let ki and si be values defined inductively by
Ki, and Si1,..., Sik as the lower part of the link. The size of both Ai and Bi is k3, and the size of each Sij is si. Link i has an associated counter ci for which 1 ≤ ci ≤ ki + 1. The initial value of ci is one for all i. The structure also has one insertion buffer I of size s1. We maintain the following invariants:
Invariant 1 For link i, Sici ,..., Siki are empty.
Invariant 2 On any path in the merge tree from some buffer to the root buffer A1, elements appear in decreasing order.
Invariant 3 Elements in buffer I appear in sorted order.
Invariant 2 can be rephrased as the entire merge tree being in heap order. It implies that in all buffers in the merge tree, the elements appear in sorted order, and that the minimum element in the queue will be in A1 or I, if buffer A1 is non-empty. Note that an invocation (Figure 34.7) of any binary merger in the tree maintains the invariants.
Layout
The Funnel Heap is laid out in consecutive memory locations in the order I, link 1, link 2, ..., with link i being laid out in the order ci, Ai, vi, Bi, Ki, Si1, Si2, ... , Siki .
Operations
To perform a DeleteMin operation we compare the smallest element in I with the smallest element in A1 and remove the smallest of these; if A1 is empty we first perform an invocation of v1. The correctness of this procedure follows immediately from Invariant 2.
To perform an Insert operation we insert the new element among the (constant number of) elements in I, maintaining Invariant 3. If the number of elements in I is now s1, we examine the links in order to find the lowest index i for which ci ≤ ki. Then we perform the following Sweep(i) operation.
In Sweep(i), we first traverse the path p from A1 to Sici and record how many elements are contained in each encountered buffer. Then we traverse the part of p going from Ai to Sici , remove the elements in the encountered buffers, and form a sorted stream σ1 of the removed elements. Next we form another sorted stream σ2 of all elements in links 1,... , i−1 and in buffer I; we do so by marking Ai temporarily as exhausted and calling DeleteMin repeatedly. We then merge σ1 and σ2 into a single stream σ, and traverse p again while inserting the front (smallest) elements of σ in the buffers on p such that they contain the same numbers of elements as before we emptied them. Finally, we insert the remaining elements from σ into Sici , reset cl to one for l = 1, 2,... ,i − 1, and increment ci.
To see that Sweep(i) does not insert more than the allowed si elements into Sici , first note that the lower part of link i is emptied each time ci is reset to one. This implies that the lower part of link i never contains more than the number of elements inserted into Si1, Si2,..., Siki by the at most ki Sweep(i) operations occurring since last time ci was reset. Since si = s1 + i−1 kj sj for all i, it follows by induction on time that no instance of Sweep(i) inserts more than si elements into Sici .
Clearly, Sweep(i) maintains Invariants 1 and 3, since I and the lower parts of links 1,... ,i − 1 are empty afterwards. Invariant 2 is also maintained, since the new elements in the buffers on p are the smallest elements in σ, distributed such that each buffer contains exactly the same number of elements as before the Sweep(i) operation. After the operation, an element on this path can only be smaller than the element occupying the same location before the operation, and therefore the merge tree is in heap order.
Analysis
To analyze the amortized cost of an Insert or DeleteMin operation, we first consider the number of memory transfers used to move elements upwards (towards A1) by invocations of binary mergers in the merge tree. For now we assume that all invocations result in full buffers, i.e., that no exhaustions occur. We imagine charging the cost of filling a particular buffer evenly to the elements being brought into the buffer, and will show that this way an element from an input buffer of Ki is charged O( 1 logM/B si) memory transfers during its ascent to A1.
Our proof rely on the optimal replacement strategy keeping as many as possible of the first links of the Funnel Heap in fast memory at all times. To analyze the number of links that fit in fast memory, we define ∆i to be the sum of the space used by links 1 to i and define iM to be the largest i for which ∆i ≤ M . By the space bound for k-mergers in Theorem 34.2 we see that the space used by link i is dominated by the Θ(siki) = Θ(ki4) space use of Si1,..., Sik . Since ki+1 = Θ(k4/3), the space used by link i grows double- exponentially with i. Hence, ∆i is a sum of double-exponentially increasing terms and is therefore dominated by its last term. In other words, ∆i = Θ(ki4) = Θ(si4/3). By the definition of iM we have ∆i ≤ M < ∆i +1. Using si+1 = Θ(s4/3) we see that logM (siM ) = Θ(1).
We imagine maintaining the credit invariant that each element in a buffer holds enough credits to be able to pay for the ascent from its current position to A1, at the cost analyzed above. In particular, an element needs O( 1 logM/B si) credits when it is inserted in an input buffer of Ki. The cost of these credits we will attribute to the Sweep(i) operation inserting it, effectively making all invocations of mergers be prepaid by Sweep(i) operations.A Sweep(i) operation also incurs memory transfers by itself; we now bound these. In the Sweep(i) operation we first form σ1 by traversing the path p from A1 to Sici . Since the links are laid out sequentially in memory, this traversal at most constitutes a linear scan of the consecutive memory locations containing A1 through Ki. Such a scan takes O((∆i−1 + |Ai| + |Bi| + |Ki|)/B) = O(ki /B) = O(si/B) memory transfers. Next we form σ2 using DeleteMin operations; the cost of which is paid for by the credits placed on the elements. Finally, we merge of σ1 and σ2 into σ, and place some of the elements in buffers on p and some of the elements in Sici . The number of memory transfers needed for this is bounded by the O(si /B) memory transfers needed to traverse p and Sici . Hence, the memory transfers incurred by the Sweep(i) operation itself is O(si /B).
After the Sweep(i) operation, the credit invariant must be reestablished. Each of the O(si) elements inserted into Sic must receive O( 1 logM/B si) credits. Additionally, the elements inserted into the part of the path p from A1 through Ai−1 must receive enough credits to cover their ascent to A1, since the credits that resided with elements in the same positions before the operations were used when forming σ2 by DeleteMin operations. This constitutes O(∆i−1) = o(si) elements, which by the analysis above, must receive O( 1 logM/B si) credits each. Altogether O(si/B) + O( si logM/B si) = O( si logM/B si) memory transfers are attributed to a Sweep(i) operation, again under the assumption that no exhaustions occur during invocations.
To actually account for exhaustions, that is, the memory transfers incurred when filling buffers that become exhausted, we note that filling a buffer partly incurs at most the same number of memory transfers as filling it entirely. This number was analyzed above to be
O(|Ai |/B) for Ai and O( |Bi | log M/B si) for Bi, when i> iM . If Bi become exhausted, onlya Sweep(i) can remove that status. If Ai become exhausted, only a Sweep(j) for j ≥ I can remove that status. As at most a single Sweep(j) with j > i can take place between one Sweep(i) and the next, Bi can only become exhausted once for each Sweep(i), and Ai can only become exhausted twice for each Sweep(i). From |Ai| = |Bi| = ki3 = Θ(si) it follows that charging Sweep(i) an additional cost of O( si logM/B si) memory transfers will cover all costs of filling buffers when exhaustion occurs.
Overall we have shown that we can account for all memory transfers if we attribute O( si logM/B si) memory transfers to each Sweep(i). By induction on i, we can show that at least si insertions have to take place between each Sweep(i). Thus, if we charge the Sweep(i) cost to the last si insertions preceding the Sweep(i), each insertion is charged O( 1 logM/B si) memory transfers. Given a sequence of operation on an initial empty priority queue, let imax be the largest i for which Sweep(i) takes place. We have simax ≤ N , where N is the number of insertions in the sequence. An insertion can be charged by at most one Sweep(i) for i = 1,... , imax, so by the double-exponential growth of si, the number of memory transfers charged to an insertion is
where the last equality follows from the tall cache assumption M = Ω(B2).
Finally, we bound the space use of the entire structure. To ensure a space usage linear in N , we create a link i when it is first used, i.e., when the first Sweep(i) occurs. At that point in time, ci, Ai, vi, Bi, Ki, and Si1 are created. These take up Θ(si) space combined. At each subsequent Sweep(i) operation, we create the next input buffer Sici of size si. As noted above, each Sweep(i) is preceded by at least si insertions, from which an O(N ) space bound follows. To ensure that the entire structure is laid out in consecutive memory locations, the structure is moved to a larger memory area when it has grown by a constant factor. When allocated, the size of the new memory area is chosen such that it will hold the input buffers Sij that will be created before the next move. The amortized cost of this is O(1/B) per insertion.
THEOREM 34.6 Using Θ(M ) fast memory, a sequence of N Insert, DeleteMin, and Delete operations can be performed on an initially empty Funnel Heap using O(N ) space in O( 1 logM/B N ) amortized memory transfers each.
Brodal and Fagerberg [16] gave a refined analysis for a variant of the Funnel Heap that shows that the structure adapts to different usage profiles. More precisely, they showed that the ith insertion uses amortized O( 1 logM/B Ni ) memory transfers, where Ni can be defined in any of the following three ways: (a) Ni is the number of elements present in the priority queue when the ith insertion is performed, (b) if the ith inserted element is removed by a DeleteMin operation prior to the jth insertion then Ni = j − i, or (c) Ni is the maximum rank of the ith inserted element during its lifetime in the priority queue, where rank denotes the number of smaller elements in the queue.
Exponential Level Based Priority Queue
While the Funnel Heap is inspired by Mergesort and uses k-mergers as the basic build- ing block, the exponential level priority queue of Arge et al. [5] is somewhat inspired by distribution sorting and uses sorting as a basic building block.
Structure
The structure consists of Θ(log log N ) levels whose sizes vary from N to some small size c below a constant threshold ct; the size of a level corresponds (asymptotically) to the number of elements that can be stored within it. The i’th level from above has size N (2/3) and for convenience we refer to the levels by their size. Thus the levels from largest to smallest are level N , level N 2/3, level N 4/9, ..., level X9/4, level X3/2, level X, level X2/3, level X 4/9, . . . , level c9/4, level c3/2, and level c. In general, a level can contain any number of elements less than or equal to its size, except level N , which always contains Θ(N ) elements. Intuitively, smaller levels store elements with smaller keys or elements that were more recently inserted. In particular, the minimum key element and the most recently inserted element are always in the smallest (lowest) level c. Both insertions and deletions are initially performed on the smallest level and may propagate up through the levels.
Elements are stored in a level in a number of buffers, which are also used to transfer elements between levels. Level X consists of one up buffer uX that can store up to X elements, and at most X 1/3 down buffers dX ,..., dX each containing between 1 X 2/3 1 X1/3 2 and 2X 2/3 elements. Thus level X can store up to 3X elements. We refer to the maximum possible number of elements that can be stored in a buffer as the size of the buffer. Refer to Figure 34.11. Note that the size of a down buffer at one level matches the size (up to a constant factor) of the up buffer one level down.
We maintain three invariants about the relationships between the elements in buffers of various levels:
The three invariants ensure that the keys of the elements in the down buffers get larger as we go from smaller to larger levels of the structure. Furthermore, an order exists between the buffers on one level: keys of elements in the up buffer are larger than keys of elements in down buffers. Therefore, down buffers are drawn below up buffers on Figure 34.11. However, the keys of the elements in an up buffer are unordered relative to the keys of the elements in down buffers one level up. Intuitively, up buffers store elements that are “on their way up”, that is, they have yet to be resolved as belonging to a particular down buffer in the next (or higher) level. Analogously, down buffers store elements that are “on their way down”— these elements are by the down buffers partitioned into several clusters so that we can quickly find the cluster of smallest key elements of size roughly equal to the next level down. In particular, the element with overall smallest key is in the first down buffer at level c.
Layout
The priority queue is laid out in memory such that the levels are stored consecutively from smallest to largest with each level occupying a single region of memory. For level X we reserve space for exactly 3X elements: X for the up buffer and 2X 2/3 for each possible down buffer. The up buffer is stored first, followed by the down buffers stored in an arbitrary order but linked together to form an ordered linked list. Thus O(log3/2 logc N N (2/3)i ) = O(N ) is an upper bound on the total memory used by the priority queue.
Operations
To implement the priority queue operations we use two general operations, push and pull. Push inserts X elements into level X3/2, and pull removes the X elements with smallest keys from level X3/2 and returns them in sorted order. An Insert or a DeleteMin is performed simply by performing a push or pull on the smallest level c.
Push. To push X elements into level X 3/2, we first sort the X elements cache-obliviously using O(1 + X logM/B X ) memory transfers. Next we distribute the elements in the sorted list into the down buffers of level X3/2 by scanning through the list and simultaneously visiting the down buffers in (linked) order. More precisely, we append elements to the end of the current down buffer dX , and advance to the next down buffer dX as soon as we encounter an element with larger key than the pivot of dX . Elements with larger keys than the pivot of the last down buffer are inserted in the up buffer uX. Scanning through the X elements take O(1 + X ) memory transfers. Even though we do not scan through every down buffer, we might perform at least one memory transfer for each of the X 1/2 possible buffers. Thus the total cost of distributing the X elements is O( X + X1/2) memory transfers.
During the distribution of elements a down buffer may run full, that is, contain 2X elements. In this case, we split the buffer into two down buffers each containing X elements using O(1 + X ) transfers. We place the new buffer in any free down buffer location for the level and update the linked list accordingly. If the level already has the maximum number X 1/2 of down buffers, we remove the last down buffer dX by inserting its no more than 2X elements into the up buffer using O(1 + X ) memory transfers. Since X elements must have been inserted since the last time the buffer split, the amortized splitting cost per element is O( 1 + 1 ) transfers. In total, the amortized number of memory transfers used on splitting buffers while distributing the X elements is O(1 + X ).
If the up buffer runs full during the above process, that is, contains more than X3/2 elements, we recursively push all of these elements into the next level up. Note that after such a recursive push, X 3/2 elements have to be inserted (pushed) into the up buffer of level X3/2 before another recursive push is needed.
Overall we can perform a push of X elements from level X into level X3/2 in O(X 1/2 + B logM/B B ) memory transfers amortized, not counting the cost of any recursive push operations; it is easy to see that a push maintains all three invariants.
In the case where the down buffers contain fewer than 3 X elements, we first pull the X3/2 elements with smallest keys from the next level up. Because these elements do not necessarily have smaller keys than the, say U , elements in the up buffer uX , we then sort this up buffer and merge the two sorted lists. Then we insert the U elements with largest keys into the up buffer, and distribute the remaining between X3/2 and X 3/2 + 3 X elements into X1/2 down buffers containing between X and X + 3 X 1/2 each (such that the O( 1 + 1 ) amortized down buffer split bound is maintained). It is easy to see that this maintains the three invariants. Afterwards, we can find the X minimal key elements as above. Note that after a recursive pull, X 3/2 elements have to be deleted (pulled) from the down buffers of level X 3/2 before another recursive pull is needed. Note also that a pull on level X3/2 does not affect the number of elements in the up buffer uX . Since we distribute elements into the down and up buffers after a recursive pull using one sort and one scan of X3/2 elements, the cost of doing so is dominated by the cost of the recursive pull operation itself. Thus ignoring the cost of recursive pulls, we have shown that a pull of X elements from level X 3/2 down to level X can be performed in O(1 + X logM/B X ) memory transfers amortized, while maintaining Invariants 4–6.
Analysis
To analyze the amortized cost of an Insert or DeleteMin operation, we consider the total number of memory transfers used to perform push and pull operations during N operations; to ensure that the structure always consists of O(log log N ) levels and use O(N ) space we rebuild it using O( N logM/B N ) memory transfers (or O( 1 logM/B N ) transfers per operation) after every N operations
Comments
Post a Comment