Randomized Dictionary Structures:Analysis of Dictionary Operations.

Analysis of Dictionary Operations

It is easy to see that the cost of Search, Insert and Delete operations are dominated by the cost of Mark procedure. Hence we shall analyze only the Mark procedure. The Mark procedure starts at the ‘r’th level and proceeds like a downward walk on a staircase and ends at level 0. The complexity of Mark procedure is clearly proportional to the number of edges it traversed. It is advantageous to view the same path as if it is built from level 0 to level r. In other words, we analyze the building of the path in a direction opposite to the direction in which it is actually built by the procedure. Such an analysis is known as backward analysis.

Henceforth let P denote the path from level 0 to level r traversed by Mark(x) for the given fixed x. The path P will have several vertical edges and horizontal edges. (Note that at every box either P moves vertically above or moves horizontally to the left). Clearly, P has r vertical edges. To estimate number of horizontal edges, we need the following lemmas.

LEMMA 13.6 Let the box b containing k at level i be marked by Mark(x). Let a box w containing k be present at level i + 1. Then, w is also marked.

Proof Since b is marked, from (13.11), we get that there is no value between k and x in level i. This fact holds good for Li+1 too because Si+1 Si. Hence the lemma.

LEMMA 13.7 Let the box b containing k at level i be marked by Mark(x). Let k /∈ Li+1. Let u be the first box to the left of b in Li having a “vertical neighbor” w. Then w is marked.

Proof Let w.key = u.key = y. Since b is marked, k satisfies condition (13.11). Since u

is the first node in the left of b having a vertical neighbor, none of the keys with values in between y and x will be in Li+1. Also, k /∈ Li+1 according to our assumption in the lemma.

Thus y is the element in Li+1 that is just less than x. That is, y satisfies the condition (13.11) at level i + 1. Hence the w at level i + 1 will be marked.

Lemmas (13.6) and (13.7) characterize the segment of P between two successive marked boxes. This allows us to give an incremental description of P in the following way.

P starts from the marked box at level 0. It proceeds vertically as far as possible (lemma 13.6) and when it cannot proceed vertically it proceeds horizontally. At the “first” opportunity to move vertically, it does so (lemma 13.7), and continues its vertical journey. Since for any box a vertical neighbor exists only with a probability p, we see that P proceeds from the current box vertically with probability p and horizontally with probability (1 p).

Hence, the number of horizontal edges of P in level i is a geometric random variable, say, Yi, with parameter (1 p). Since the number of vertical edges in P is exactly r, we conclude,

THEOREM 13.13 The number of edges traversed by Mark(x) for any fixed x is given by | P |= r + (Y0 + Y1 + Y2 + ··· + Yr1) where Yi is a geometric random variable with parameters 1 p and r is the random variable denoting the number of levels in the Skip List.

Our next mission is to obtain a high confidence bound for the size of P . As we have already derived high confidence bound for r, let focus on the sum Hr = Y0 + Y1 + ··· + Yr1

for a while. Since r is a random variable Hr is not a deterministic sum of random variables but a random sum of random variables.

Hence we cannot directly apply theorem (13.6) and the bounds for negative binomial distributions.

Let X be the event of the random variable r taking a value less than or equal to 4 log n.

Note that image

From the law of total probability, Boole’s inequality and equation (13.10) we get,

image

Now P r(H4 log n > 16 log n) can be computed in a manner identical to the one we carried out in the space complexity analysis. Notice that H4 log n is a deterministic sum of geometric random variables. Hence we can apply theorem (13.10) and theorem (13.9) to derive a high confidence bound. Specifically, by theorem (13.10),

image

where X is a binomial random variable with parameters 16 log n and p. Choosing p = 1/2 allows us to set µ = 8 log n, k = 4 log n and replace n by 16 log n in the first inequality of theorem (13.9). Putting all these together we obtain,

image

This completes the derivation of a high confidence bound for Hr . From this, we can easily obtain a bound for expected value of Hr . We use theorem ( 13.3) and write the expression for E(Hr ) as

image

The first sum is bounded above by 16 log n as each probability value is less than 1 and by the high confidence bound that we have established just now, we see that the second sum is dominated by ), which is a constant. Thus we obtain,

image

THEOREM 13.14 The Dictionary operations Insert, Delete, and Search take O(log n) expected time when implemented using Skip Lists. Moreover, the running time for Dictionary operations in a Skip List is O(log n) with high probability.

Comments

Popular posts from this blog

Data Structure Visualization:Introduction and Value of Data Structure Rendering

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Drawing Trees:HV-Layout