Persistent Data Structures:Making Specific Data Structures More Efficient

Making Speciﬁc Data Structures More Eﬃcient

The purely functional deques of Kaplan and Tarjan [23], the conﬂuently persistent deques of Kaplan, Okasaki, and Tarjan [21], the purely functional heaps of Brodal and Okasaki [6], and the purely functional ﬁnger search trees of Kaplan and Tarjan [22], are all based on a simple and useful mechanism called redundant counters, which to the best of our knowledge ﬁrst appeared in lecture notes by Clancy and Knuth [9]. In this section we describe what redundant counters are, and demonstrate how they are used in simple persistent deques data structure.

A persistent implementation of deques support the following operations:

ql = push(x, q): Inserts an element x to the beginning of the deque q returning a new deque ql in which x is the ﬁrst element followed by the elements of q.

(x, ql) = pop(q): Returns a pair where x is the ﬁrst element of q and ql is a deque containing all elements of q but x.

ql = Inject(x, q): Inserts an element x to the end of the deque q returning a new deque ql in which x is the last element preceded by the elements of q.

(x, ql) = eject(q): Returns a pair where x is the last element of q and ql is a deque containing all elements of q but x.

A stack supports only push and pop, a queue supports only push and eject. Catenable deques also support the operation q = catenate(q1, q2): Returns a queue q containing all the elements of q1 followed by the elements of q2.

Although queues, and in particular catenable queues, are not trivial to make persistent, stacks are easy. The regular representation of a stack by a singly linked list of nodes, each containing an element, ordered from ﬁrst to last, is in fact purely functional. To push an element onto a stack, we create a new node containing the new element and a pointer to the node containing the previously ﬁrst element on the stack. To pop a stack, we retrieve the ﬁrst element and a pointer to the node containing the previously second element.

Direct ways to make queues persistent simulate queues by stacks. One stack holds ele- ments from the beginning of the queue and the other holds elements from its end. If we are interested in fully persistence this simulation should be real time and its details are not trivial. For a detailed discussion see Kaplan and Tarjan [23] and the references there.

Kaplan and Tarjan [23] described a new way to do a simulation of a deque with stacks. They suggest to represent a deque by a recursive structure that is built from bounded-size deques called buﬀers. Buﬀers are of two kinds: preﬁxes and suﬃxes. A non-empty deque q over a set A is represented by an ordered triple consisting of a preﬁx , pref ix(q), of elements of A, a child deque, child(q), whose elements are ordered pairs of elements of A, and a suﬃx , suff ix(q), of elements of A. The order of elements within q is the one consistent with the orders of all of its component parts. The child deque child(q), if non-empty, is represented in the same way. Thus the structure is recursive and unwinds linearly. We deﬁne the descendants {childi( q)} of deque d in the standard way, namely child0(q) = q and childi+1(q) = child(childi(q)) for i ≥ 0 if childi(q) is non-empty.

Observe that the elements of q are just elements of A, the elements of child(q) are pairs of elements of A, the elements of child(child(q)) are pairs of pairs of elements of A, and so on. One can think of each element of childi(q) as being a complete binary tree of depth i, with elements of A at its 2i leaves. One can also think of the entire structure representing q as a stack (of q and its descendants), each element of which is preﬁx-suﬃx pair. All the elements of q are stored in the preﬁxes and suﬃxes at the various levels of this structure, grouped into binary trees of the appropriate depths: level i contains the preﬁx and suﬃx of childi(q). See Figure 31.3.

FIGURE 31.3: Representation of a deque of elements over A. Each circle denotes a deque and each rectangle denotes a buﬀer. Squares correspond to elements from A which we denote by numbers and letters. Each buﬀer contains 0, 1, or 2 elements. Three versions are shown V1, V2, and V3. Version V2 was obtained from V1 by injecting the element f . Version V3 obtained from version V2 by injecting the element g. The latter inject triggered two recursive injects into the child and grandchild deques of V2. Note that identical binary trees and elements are represented only once but we draw them multiple times to avoid cluttering of the ﬁgure.

Because of the pairing, we can bring two elements up to level i by doing one pop or eject at level i + 1. Similarly, we can move two elements down from level i by doing one push or inject at level i + 1. This two-for-one payoﬀ leads to real-time performance.

Assume that each preﬁx or suﬃx is allowed to hold 0, 1, or 2 elements, from the beginning or end of the queue, respectively. We can implement ql = push(x, q) as follows. If the preﬁx of q contains 0 or 1 elements we allocate a new node to represent ql make its child deque and its suﬃx identical to the child and suﬃx of q, respectively. The preﬁx of ql is a newly allocated preﬁx containing x and the element in the preﬁx of q, if the preﬁx of q contained one element. We return a pointer the new node which represents ql. For an example consider version V2 shown in Figure 31.3 that was obtained from version V1 by a case of inject symmetric to the case of the push just described.

The hard case of the push is when the preﬁx of q already contains two elements. In this case we make a pair containing these two elements and push this pair recursively into child(q). Then we allocate a new node to represent ql, make its suﬃx identical to the suﬃx of q, make the deque returned by the recursive push to child(q) the child of ql, and make the preﬁx of ql be a newly allocated preﬁx containing x. For an example consider version V3 shown in Figure 31.3 that was obtained from version V2 by a recursive case of inject symmetric to the recursive case of the push just described. The implementations of pop and eject is symmetric.

This implementation is clearly purely functional and therefore fully persistent. However the time and space bounds per operation are O(log n). The same bounds as one gets by using search trees to represent the deques with the path copying technique. These logarithmic time bounds are by far oﬀ from the ephemeral O(1) time and space bounds.

Notice that there is a clear correspondence between this data structure and binary coun- ters. If we think of a buﬀer with two elements as the digit 1, and of any other buﬀer as the digit 0, then the implementation of push(q) is similar to adding one to the binary number deﬁned by the preﬁxes of the queues childi(q). It follows that if we are only interested in partially persistent deques then this implementation has O(1) amortized time bound per operation (see the discussion of binary counters in the next section). To make this simu- lation eﬃcient in a fully persistent setting and even in the worst case, Kaplan and Tarjan suggested to use redundant counters.

Redundant Binary Counters

To simplify the presentation we describe redundant binary counters, but the ideas carry over to any basis. Consider ﬁrst the regular binary representation of an integer i. To obtain from this representation the representation of i + 1 we ﬁrst ﬂip the rightmost digit. If we ﬂipped a 1 to 0 then we repeat the process on the next digit to the left. Obviously, this process can be long for some integers. But it is straightforward to show that if we carry out a sequence of such increments starting from zero then on average only a constant number of digits change per increment.∗∗ Redundant binary representations (or counters as we will call them) address the problem of how to represent i so we can obtain a representation of i + 1 while changing only a constant number of digits in the worst case.

A redundant binary representation, d, of a non-negative integer x is a sequence of digits We call d regular if, between any two digits equal to 2, there is a 0, and there is a 0 between the rightmost 2 and the least signiﬁcant digit. Notice that the traditional binary representation of each integer (which does not use the digit 2) is regular . In the sequel when we refer to a regular representation we mean a regular redundant binary representation, unless we explicitly state otherwise.

Suppose we have a regular representation of i. We can obtain a regular representation of i + 1 as follows. First we increment the rightmost digit. Note that since the representation of i is regular, its rightmost digit is either 0 or 1. So after the increment the rightmost digit is either 1 or 2 and we still have a redundant binary representation for i + 1. Our concern is that this representation of i + 1 may not be regular. However, since the representation of i we started out with was regular the only violation to regularity that we may have in the representation of i + 1 is lacking a 0 between the rightmost 2 and the least signiﬁcant digit. It is easy to check that between any two digits equal to 2, there still is a 0, by the regularity of i.

We can change the representation of i + 1 to a representation which is guaranteed to be regular by a simple ﬁx operation. A ﬁx operation on a digit di = 2 increments di+1 by 1 and sets di to 0, producing a new regular representation dl representing the same number as d.†† If after incrementing the rightmost digit we perform a ﬁx on the rightmost 2 then we switch to another representation of i + 1 that must be regular. We omit the proof here which is straightforward.

It is clear that the increment together with the ﬁx that may follow change at most three digits. Therefore redundant binary representations allow to perform an increment while changing constantly many digits. However notice that in any application of this numbering system we will also need a representation that allows to get to the digits which we need to ﬁx eﬃciently. We show one such representation in the next section.

These redundant representations can be extended so we can also decrement it while changing only a constant number of digits, or even more generally so that we can increment or decrement any digit (add or subtract 2i) while changing a constant number of other digits. These additional properties of the counters were exploited by other applications (see e.g. [22, 24]).

Persistent Deques

Kaplan and Tarjan use this redundant binary system to improve the deque implementation we described above as follows. We allow each of the preﬁxes and suﬃxes to contain between 0 and 5 elements. We label each buﬀer, and each deque, by one of the digits 0, 1, and 2. We label a buﬀer 0 if it has two or three elements, we label it 1 if it has one or four elements, and we label it 2 if it has zero or ﬁve elements. Observe that we can add one element to or delete one element from a buﬀer labeled 0 or 1 without violating its size constraint: A buﬀer labeled 0 may change its label to 1, and a buﬀer labeled 1 may change its label to 2. (In fact a 1 can also be changed to 0 but this may not violate regularity.) The label of a deque is the larger among the labels of its buﬀers, unless its child and one of its buﬀers are empty, in which case the label of the deque is identical to the label of its nonempty buﬀer. This coloring of the deques maps each deque to a redundant binary representation. The least signiﬁcant digit of this representation is the digit of q, the next signiﬁcant digit is the digit of child(q), and, in general, the ith signiﬁcant digit is the digit corresponding to childi(q) if the latter is not empty. We impose an additional constraint on the deques and require that the redundant binary representation of any top-level deque is regular.

A regular top-level deque is labeled 0 or 1 which implies that both its preﬁx and its suﬃx are labeled 0 or 1. This means that any deque operation can be performed by operating on the appropriate top-level buﬀer. Suppose that the operation is either a push or a pop, the case of inject and eject is symmetric. We can construct the resulting queue ql by setting child(ql) = child(q) and suff ix(ql) = suff ix(q). The preﬁx of ql is a newly allocated buﬀer that contains the elements in pref ix(q) together with the new element in case of push or without the ﬁrst element in case of pop. Clearly all these manipulations take constant time. The label of ql, however, may be one larger than the label of q. So the redundant binary representation corresponding to ql is either the same as the redundant binary representation of q in which case it is regular, or it is obtained from the redundant binary representation of q by incrementing the least signiﬁcant digit. (The least signiﬁcant digit can also decrease in which case regularity is also preserved.) This corresponds to the ﬁrst step in the increment procedure for redundant regular representations described in the previous section.

To make the redundant binary representation of ql regular we may have to apply a ﬁx operation. Let i be the minimum such that childi(ql) is labeled 2. If for all j < i, childj (ql) is labeled 1 then the ﬁx has to change the label of childi(ql) to 0 and increment the label of childi+1(ql).

Fortunately, we have an appropriate interpretation for such a ﬁx. Assume childi+1(ql) have a non-empty child. (We omit the discussion of the case where childi+1(ql) have an empty child which is similar.) We know that the label of childi+1(ql) is either 0 or 1 so neither of its buﬀers is empty or full. If the preﬁx of childi(ql) has at least four elements we eject 2 of these elements and push them as a single pair to the preﬁx of childi+1(ql). If the preﬁx of childi(ql) has at most one element we pop a pair from the preﬁx of childi+1(ql) and inject the components of the pair into the preﬁx of childi(ql). This makes the preﬁx of childi(ql) containing either two or three elements. Similarly by popping a pair from or pushing a pair to the suﬃx of childi(ql), and injecting a pair to or ejecting a pair from the suﬃx of childi+1(ql) we make the suﬃx of childi(ql) containing two or three elements. As a result the label of childi(ql) and its two buﬀers becomes 0 while possibly increasing the label of one or both buﬀers of childi+1(ql) and thereby the label of childi+1(ql) as well.

There is one missing piece for this simulation to work eﬃciently. The topmost deque labeled 2 may be arbitrarily deep in the recursive structure of ql, since it can be separated from the top level by many deques labeled 1. To implement the ﬁx eﬃciently we have to be able to ﬁnd this deque fast and change it in a purely functional way by copying the deques that change without having to copy all their ancestors deques.

For this reason we do not represent a deque in the obvious way, as a stack of preﬁx-suﬃx pairs. Instead, we break this stack up into substacks. There is one substack for the top- level deque and one for each descendant deque labeled 0 or 2 not at the top level. Each substack consists of a top-level, or a deque labeled 0, or a deque labeled 2 and all consecutive proper descendant deques labeled 1. We represent the entire deque by a stack of substacks of preﬁx-suﬃx pairs using this partition into substacks. This can be realized with four pointers per each node representing a deque at some level. Two of the pointers are to the preﬁx and suﬃx of the deque. One pointer is to the node for the child deque if this deque is non-empty and labeled 1. One pointer is to the node of the nearest proper descendant deque not labeled 1, if such a deque exists and q itself is not labeled 1 or top-level. See Figure 4.2.

A single deque operation will require access to at most the top three substacks, and to at most the top two elements in any such substack. The label changes caused by a deque operation produce only minor changes to the stack partition into substacks, changes that can be made in constant time. In particular, changing the label of the top-level deque does not aﬀect the partition into substacks. Changing the label of the topmost deque which is labeled 2 to 0 and the label of its child from 1 to 2 splits one substack into its ﬁrst element, now a new substack, and the rest. This is just a substack pop operation. Changing the label of the topmost deque which is labeled 2 to 0 and the label of its child from 0 to 1 merges a singleton substack with the substack under it. This is just a substack push operation.

To add catenation, Kaplan and Tarjan had to change the deﬁnition of the data structure and allow deques to be stored as components of elements of recursive deques. The redundant

FIGURE 31.4: Pointer representation of stack of substacks structure. Each circle corre- sponds to a deque and it is marked by its label. Each buﬀer is a rectangle which is marked by its label. Triangles denote complete binary trees of elements whose depths depend on the level. This particular queue is represented by a stack of three substacks.

binary numbering system, however, still plays a key role. To represent a catenable deque, Kaplan and Tarjan use noncatenable deques as the basic building blocks. They deﬁne a triple over a set A recursively as a preﬁx of elements of A, a possibly empty deque of triples over A, and a suﬃx of elements of A, where each preﬁx or suﬃx is a noncatenable deque. Then, they represent a catenable deque of elements from A by either one or two triples over A. The underlying skeleton of this structure is a binary tree (or two binary trees) of triples.

The redundant binary number system is extended so that it can distribute work along these trees by adding an extra digit.

Kaplan, Okasaki, and Tarjan [21] simpliﬁed these data structures at the expense of making the time bounds amortized rather than worst case and using assignment, thus obtaining a conﬂuently persistent data structure which is not purely functional. The key idea underlying their data structure is to relax the rigid constraint of maintaining regularity. Instead, we “improve” the representation of a deque q with full or empty preﬁx when we try to push or pop an element from it. Similarly, with full or empty suﬃx. This improvement in the representation of q is visible to all deques that contain q as a subdeque at some level and prevents from pushing into deques with full preﬁxes or popping from deques with empty preﬁxes from happening too often.

More speciﬁcally, assume that we push into a deque q with full preﬁx. First, we eject two elements from this preﬁx, make a pair containing them, and push the pair recursively into child(q). Let the result of the recursive push be childl(q). Then we change the representation of q so that it has a new preﬁx which contains all the elements in the preﬁx of q but the two which we ejected, and its child deque is childl(q). The suﬃx of q does not change. Finally we perform the push into q by creating a new queue ql that has the same suﬃx and child deque as q, but has a new preﬁx that contains the elements in the preﬁx of q together with the new element. A careful but simple analysis shows that each operation in this implementation takes O(1) amortized time. By extending this idea, Kaplan, Okasaki, and Tarjan managed to construct catenable deques using only constant size buﬀers as the basic building blocks.

Search This Blog

algorithms

Persistent Data Structures:Making Specific Data Structures More Efficient

Making Speciﬁc Data Structures More Eﬃcient

Comments

Post a Comment

Popular posts from this blog

Warshall’s Algorithm -to find TRANSITIVE CLOSURE

0/1 Knapsack Problem Memory function.

Collision Detection:Penetration Depth Computation