Randomized Dictionary Structures:Analysis of Dictionary Operations.

By احمد جاد الله فرحات - April 23, 2015

Analysis of Dictionary Operations

It is easy to see that the cost of Search, Insert and Delete operations are dominated by the cost of Mark procedure. Hence we shall analyze only the Mark procedure. The Mark procedure starts at the ‘r’th level and proceeds like a downward walk on a staircase and ends at level 0. The complexity of Mark procedure is clearly proportional to the number of edges it traversed. It is advantageous to view the same path as if it is built from level 0 to level r. In other words, we analyze the building of the path in a direction opposite to the direction in which it is actually built by the procedure. Such an analysis is known as backward analysis.

Henceforth let P denote the path from level 0 to level r traversed by Mark(x) for the given ﬁxed x. The path P will have several vertical edges and horizontal edges. (Note that at every box either P moves vertically above or moves horizontally to the left). Clearly, P has r vertical edges. To estimate number of horizontal edges, we need the following lemmas.

LEMMA 13.6 Let the box b containing k at level i be marked by Mark(x). Let a box w containing k be present at level i + 1. Then, w is also marked.

Proof Since b is marked, from (13.11), we get that there is no value between k and x in level i. This fact holds good for Li+1 too because Si+1 ⊆ Si. Hence the lemma.

LEMMA 13.7 Let the box b containing k at level i be marked by Mark(x). Let k /∈ Li+1. Let u be the ﬁrst box to the left of b in Li having a “vertical neighbor” w. Then w is marked.

Proof Let w.key = u.key = y. Since b is marked, k satisﬁes condition (13.11). Since u

is the ﬁrst node in the left of b having a vertical neighbor, none of the keys with values in between y and x will be in Li+1. Also, k /∈ Li+1 according to our assumption in the lemma.

Thus y is the element in Li+1 that is just less than x. That is, y satisﬁes the condition (13.11) at level i + 1. Hence the w at level i + 1 will be marked.

Lemmas (13.6) and (13.7) characterize the segment of P between two successive marked boxes. This allows us to give an incremental description of P in the following way.

P starts from the marked box at level 0. It proceeds vertically as far as possible (lemma 13.6) and when it cannot proceed vertically it proceeds horizontally. At the “ﬁrst” opportunity to move vertically, it does so (lemma 13.7), and continues its vertical journey. Since for any box a vertical neighbor exists only with a probability p, we see that P proceeds from the current box vertically with probability p and horizontally with probability (1 − p).

Hence, the number of horizontal edges of P in level i is a geometric random variable, say, Yi, with parameter (1 − p). Since the number of vertical edges in P is exactly r, we conclude,

THEOREM 13.13 The number of edges traversed by Mark(x) for any ﬁxed x is given by | P |= r + (Y0 + Y1 + Y2 + ··· + Yr−1) where Yi is a geometric random variable with parameters 1 − p and r is the random variable denoting the number of levels in the Skip List.

Our next mission is to obtain a high conﬁdence bound for the size of P . As we have already derived high conﬁdence bound for r, let focus on the sum Hr = Y0 + Y1 + ··· + Yr−1

for a while. Since r is a random variable Hr is not a deterministic sum of random variables but a random sum of random variables.

Hence we cannot directly apply theorem (13.6) and the bounds for negative binomial distributions.

Let X be the event of the random variable r taking a value less than or equal to 4 log n.

Note that

From the law of total probability, Boole’s inequality and equation (13.10) we get,

Now P r(H4 log n > 16 log n) can be computed in a manner identical to the one we carried out in the space complexity analysis. Notice that H4 log n is a deterministic sum of geometric random variables. Hence we can apply theorem (13.10) and theorem (13.9) to derive a high conﬁdence bound. Speciﬁcally, by theorem (13.10),

where X is a binomial random variable with parameters 16 log n and p. Choosing p = 1/2 allows us to set µ = 8 log n, k = 4 log n and replace n by 16 log n in the ﬁrst inequality of theorem (13.9). Putting all these together we obtain,

This completes the derivation of a high conﬁdence bound for Hr . From this, we can easily obtain a bound for expected value of Hr . We use theorem ( 13.3) and write the expression for E(Hr ) as

The ﬁrst sum is bounded above by 16 log n as each probability value is less than 1 and by the high conﬁdence bound that we have established just now, we see that the second sum is dominated by ), which is a constant. Thus we obtain,

THEOREM 13.14 The Dictionary operations Insert, Delete, and Search take O(log n) expected time when implemented using Skip Lists. Moreover, the running time for Dictionary operations in a Skip List is O(log n) with high probability.

Search This Blog

algorithms

Randomized Dictionary Structures:Analysis of Dictionary Operations.

Analysis of Dictionary Operations

Comments

Post a Comment

Popular posts from this blog

0/1 Knapsack Problem Memory function.

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Drawing Trees:HV-Layout