Hash Tables:Random Probing
Random Probing
Next we consider hash table implementations under the random probing assumption: Each element x stored in the hash table comes with a random sequence x0, x1, x2,... where each of the xi is independently and uniformly distributed in {1,... , m}.4 We begin with a discussion of the two basic paradigms: hashing with chaining and open addressing. Both these paradigms attempt to store the key x at array position A[x0]. The difference between these two algorithms is their collision resolution strategy, i.e., what the algorithms do when a user inserts the key value x but array position A[x0] already contains some other key.
Hashing with Chaining
In hashing with chaining, a collision is resolved by allowing more than one element to live at each position in the table. Each entry in the array A is a pointer to the head of a linked list. To insert the value x, we simply append it to the list A[x0]. To search for the element x, we perform a linear search in the list A[x0]. To delete the element x, we search for x in the list A[x0] and splice it out.
It is clear that insertions take O(1) time, even in the worst case. For searching and deletion, the running time is proportional to a constant plus the length of the list stored at A[x0]. Notice that each of the at most n elements not equal to x is stored in A[x0] with probability 1/m, so the expected length of A[x0] is either α = n/m (if x is not contained in the table) or 1 + (n − 1)/m (if x is contained in the table). Thus, the expected cost of searching for or deleting an element is O(1 + α).
The above analysis shows us that hashing with chaining supports the three set ADT operations in O(1) expected time per operation, as long as the occupancy, α, is a constant. It is worth noting that this does not require that the value of α be less than 1.
If we would like more detailed information about the cost of searching, we might also ask about the worst-case search time defined as
W = max{length of the list stored at A[i] : 0 ≤ i ≤ m − 1} .
It is very easy to prove something quite strong about W using only the fact that the length of each list A[i] is a binomial(n, 1/m) random variable. Using Chernoff’s bounds on the tail of the binomial distribution [13], this immediately implies that
Thus, with very high probability, the worst-case search time is logarithmic in n. This also implies that E[W ] = O(log n). The distribution of W has been carefully studied and it is known that, with high probability, i.e., with probability 1 − o(1), W = (1 + o(1)) ln n/ ln ln n [33, 38].5 Gonnet has proven a more accurate result that W = Γ−1(n) − 3/2 + o(1) with high probability. Devroye [18] shows that similar results hold even when the distribution of x0 is not uniform.
Hashing with Open Addressing
Hashing with open addressing differs from hashing with chaining in that each table position A[i] is allowed to store only one value. When a collision occurs at table position i, one of the two elements involved in the collision must move on to the next element in its probe sequence. In order to implement this efficiently and correctly we require a method of marking elements as deleted. This method could be an auxiliary array that contains one bit for each element of A, but usually the same result can be achieved by using a special key value del that does not correspond to any valid key.
To search for an element x in the hash table we look for x at positions A[x0], A[x1], A[x2], and so on until we either (1) find x, in which case we are done or (2) find an empty table position A[xi ] that is not marked as deleted, in which case we can be sure that x is not stored in the table (otherwise it would be stored at position xi). To delete an element x from the hash table we first search for x. If we find x at table location A[xi] we then simply mark A[xi] as deleted. To insert a value x into the hash table we examine table positions A[x0], A[x1], A[x2], and so on until we find a table position A[xi ] that is either empty or marked as deleted and we store the value x in A[xi].
Consider the cost of inserting an element x using this method. Let ix denote the smallest value i such that xix is either empty or marked as deleted when we insert x. Thus, the cost of inserting x is a constant plus ix. The probability that the table position x0 is occupied is at most α so, with probability at least 1 − α, ix = 0. Using the same reasoning, the probability that we store x at position xi is at most
since the table locations x0,..., xi−1 must be occupied, the table location xi must not be occupied and the xi are independent. Thus, the expected number of steps taken by the insertion algorithm is
for any constant 0 < α < 1. The cost of searching for x and deleting x are both proportional to the cost of inserting x, so the expected cost of each of these operations is O(1/(1 − α)).6
We should compare this with the cost of hashing with chaining. In hashing with chain- ing,the occupancy α has very little effect on the cost of operations. Indeed, any constant α, even greater than 1 results in O(1) time per operation. In contrast, open addressing is very dependent on the value of α. If we take α > 1 then the expected cost of insertion using open addressing is infinite since the insertion algorithm never finds an empty table position. Of course, the advantage of hashing with chaining is that it does not require lists at each of the A[i]. Therefore, the overhead of list pointers is saved and this extra space can be used instead to maintain the invariant that the occupancy α is a constant strictly less than 1.
Next we consider the worst case search time of hashing with open addressing. That is, we study the value W = max{ix : x is stored in the table at location ix}. Using (9.3) and Boole’s inequality it follows almost immediately that
Thus, with very high probability, W , the worst case search time, is O(log n). Tighter bounds on W are known when the probe sequences x0,... , xm−1 are random permutations of 0,...,m − 1. In this case, Gonnet[29] shows that
6Note that the expected cost of searching for or deleting an element x is proportional to the value of α at the time x was inserted. If many deletions have taken place, this may be quite different than the current value of α.
Open addressing under the random probing assumption has many nice theoretical prop- erties and is easy to analyze. Unfortunately, it is often criticized as being an unrealistic model because it requires a long random sequences x0, x1, x2,... for each element x that is to be stored or searched for. Several variants of open addressing discussed in the next few sections try to overcome this problem by using only a few random values.
Linear Probing
Linear probing is a variant of open addressing that requires less randomness. To obtain the probe sequence x0, x1, x2,... we start with a random element x0 ∈ {0,... ,m − 1}. The element xi, i > 0 is given by xi = (i + x0) mod m. That is, one first tries to find x at location x0 and if that fails then one looks at (x0 + 1) mod m, (x0 + 2) mod m and so on.
The performance of linear probing is discussed by Knuth [37] who shows that the expected number of probes performed during an unsuccessful search is at most
This is not quite as good as for standard hashing with open addressing, especially in the unsuccessful case.
Linear probing suffers from the problem of primary clustering. If j consecutive array entries are occupied then a newly inserted element will have probability j/m of hashing to one of these entries. This results in j + 1 consecutive array entries being occupied and increases the probability (to (j + 1)/m) of another newly inserted element landing in this cluster. Thus, large clusters of consecutive elements have a tendency to grow larger.
Quadratic Probing
Quadratic probing is similar to linear probing; an element x determines its entire probe sequence based on a single random choice, x0. Quadratic probing uses the probe sequence x0, (x0 + k1 + k2) mod m, (x0 + 2k1 + 22k2) mod m,.. .. In general, the ith element in the probe sequence is xi = (x0 + ik1 + i2k2) mod m. Thus, the final location of an element depends quadratically on how many steps were required to insert it. This method seems to work much better in practice than linear probing, but requires a careful choice of m, k1 and k2 so that the probe sequence contains every element of {0,... ,m − 1}.
The improved performance of quadratic probing is due to the fact that if there are two elements x and y such that xi = yj then it is not necessarily true (as it is with linear probing) that xi+1 = yj+1. However, if x0 = y0 then x and y will have exactly the same probe sequence. This lesser phenomenon is called secondary clustering. Note that this secondary clustering phenomenon implies that neither linear nor quadratic probing can hope to perform any better than hashing with chaining. This is because all the elements that have the same initial hash x0 are contained in an implicit chain. In the case of linear probing, this chain is defined by the sequence x0, x0 + 1, x0 + 2,... while for quadratic probing it is defined by the sequence x0, x0 + k1 + k2, x0 + 2k1 + 4k2,...
Double Hashing
Double hashing is another method of open addressing that uses two hash values x0 and x1. Here x0 is in the set {0,... ,m − 1} and x1 is in the subset of {1,... ,m − 1} that is relatively prime to m. With double hashing, the probe sequence for element x becomes x0, (x0 + x1) mod m, (x0 + 2x1) mod m,.. .. In general, xi = (x0 + ix1) mod m, for i > 0. The expected number of probes required by double hashing seems difficult to determine exactly. Guibas has proven that, asymptotically, and for occupancy α ≤ .31, the performance of double hashing is asymptotically equivalent to that of uniform hashing. Empirically, the performance of double hashing matches that of open addressing with random probing regardless of the occupancy α [37].
Brent’s Method
Brent’s method [5] is a heuristic that attempts to minimize the average time for a successful search in a hash table with open addressing. Although originally described in the context of double hashing (Section 9.3.5) Brent’s method applies to any open addressing scheme. The age of an element x stored in an open addressing hash table is the minimum value i such that x is stored at A[xi ]. In other words, the age is one less than the number of locations we will probe when searching for x.
Brent’s method attempts to minimize the total age of all elements in the hash table. To insert the element x we proceed as follows: We find the smallest value i such that A[xi] is empty; this is where standard open-addressing would insert x. Consider the element y stored at location A[xi−2 ]. This element is stored there because yj = xi−2, for some j ≥ 0. We check if the array location A[yj+1] is empty and, if so, we move y to location A[yj+1] and store x at location A[xi−2]. Note that, compared to standard open addressing, this decreases the total age by 1. In general, Brent’s method checks, for each 2 ≤ k ≤ i the array entry A[xi−k ] to see if the element y stored there can be moved to any of A[yj+1], A[yj+2],..., A[yj+k−1] to make room for x. If so, this represents a decrease in the total age of all elements in the table and is performed.
Although Brent’s method seems to work well in practice, it is difficult to analyze theoretically. Some theoretical analysis of Brent’s method applied to double hashing is given by Gonnet and Munro [31]. Lyon [44], Munro and Celis [49] and Poblete [52] describe some variants of Brent’s method.
Multiple-Choice Hashing
It is worth stepping back at this point and revisiting the comparison between hash tables and binary search trees. For balanced binary search trees, the average cost of searching for an element is O(log n). Indeed, it easy to see that for at least n/2 of the elements, the cost of searching for those elements is Ω(log n). In comparison, for both the random probing schemes discussed so far, the expected cost of search for an element is O(1). However, there are a handful of elements whose search cost is Θ(log n/ log log n) or Θ(log n) depending on whether hashing with chaining or open addressing is used, respectively. Thus there is an inversion: Most operations on a binary search tree cost Θ(log n) but a few elements (close to the root) can be accessed in O(1) time. Most operations on a hash table take O(1) time but a few elements (in long chains or with long probe sequences) require Θ(log n/ log log n) or Θ(log n) time to access. In the next few sections we consider variations on hashing with chaining and open addressing that attempt to reduce the worst-case search time W .
Multiple-choice hashing is hashing with chaining in which, during insertion, the element x has the choice of d ≥ 2 different lists in which it can be stored. In particular, when we insert x we look at the lengths of the lists pointed to by A[x0],..., A[xd−1] and append x to A[xi ], 0 ≤ i < d such that the length of the list pointed to by A[xi] is minimum. When searching for x, we search for x in each of the lists A[x0],..., A[xd−1] in parallel. That is, we look at the first elements of each list, then the second elements of each list, and so on until we find x. As before, to delete x we first search for it and then delete it from whichever list we find it in.
It is easy to see that the expected cost of searching for an element x is O(d) since the expected length of each the d lists is O(1). More interestingly, the worst case search time is bounded by O(dW ) where W is the length of the longest list. Azar et al. [3] show that
Thus, the expected worst case search time for multiple-choice hashing is O(log log n) for any constant d ≥ 2.
Asymmetric Hashing
Asymmetric hashing is a variant of multiple-choice hashing in which the hash table is split into d blocks, each of size n/d. (Assume, for simplicity, that n is a multiple of d.)
The probe value xi, 0 ≤ i < d is drawn uniformly from {in/d,... , (i + 1)n/d − 1}. As with multiple-choice hashing, to insert x the algorithm examines the lengths of the lists A[x0], A[x1],..., A[xd−1] and appends x to the shortest of these lists. In the case of ties, it appends x to the list with smallest index. Searching and deletion are done exactly as in multiple-choice hashing.
V¨ocking [64] shows that, with asymmetric hashing the expected length of the longest list is
The function φd is a generalization of the golden ratio, so that φ2 = (1 + √5)/2. Note that this improves significantly on standard multiple-choice hashing (9.4) for larger values of d.
LCFS Hashing
LCFS hashing is a form of open addressing that changes the collision resolution strategy.7 Reviewing the algorithm for hashing with open addressing reveals that when two elements collide, priority is given to the first element inserted into the hash table and subsequent elements must move on. Thus, hashing with open addressing could also be referred to as FCFS (first-come first-served) hashing.
With LCFS (last-come first-served) hashing, collision resolution is done in exactly the opposite way. When we insert an element x, we always place it at location x0. If position x0 is already occupied by some element y because yj = x0 then we place y at location yj+1, possibly displacing some element z, and so on.
Poblete and Munro [53] show that, after inserting n elements into an initially empty table, the expected worst case search time is bounded above by
where Γ is the gamma function and
Historically, LCFS hashing is the first version of open addressing that was shown to have an expected worst-case search time that is o(log n).
Robin-Hood Hashing
Robin-Hood hashing [9, 10, 61] is a form of open addressing that attempts to equalize the search times of elements by using a fairer collision resolution strategy. During insertion, if we are trying to place element x at position xi and there is already an element y stored at position yj = xi then the “younger” of the two elements must move on. More precisely, if i ≤ j then we will try to insert x at position xi+1, xi+2 and so on. Otherwise, we will store x at position xi and try to to insert y at positions yj+1, yj+2 and so on.
Devroye et al. [20] show that, after performing n insertions on an initially empty table of size m = αn using the Robin-Hood insertion algorithm, the worst case search time has expected value
and this bound is tight. Thus, Robin-Hood hashing is a form of open addressing that has doubly-logarithmic worst-case search time. This makes it competitive with the multiple- choice hashing method of Section 9.3.7.
Cuckoo Hashing
Cuckoo hashing [50] is a form of multiple choice hashing in which each element x lives in one of two tables A or B, each of size m = n/α. The element x will either be stored at location A[xA ] or B[xB ]. There are no other options. This makes searching for x an O(1) time operation since we need only check two array locations.
The insertion algorithm for cuckoo hashing proceeds as follows:8 Store x at location A[xA ]. If A[xA ] was previously occupied by some element y then store y at location B[yB ]. If B[yB ] was previously occupied by some element z then store z at location A[zA], and so on. This process ends when we place an element into a previously empty table slot or when it has gone on for more than c log n steps. In the former case, the insertion of x completes successfully. In the latter case the insertion is considered a failure, and the entire hash table is reconstructed from scratch using a new probe sequence for each element in the table. That is, if this reconstruction process has happened i times then the two hash values we use for an element x are xA = x2i and xB = x2i+1 .
Pagh and Rodler [50] (see also Devroye and Morin [19]) show that, during the insertion of n elements, the probability of requiring a reconstruction is O(1/n). This, combined with the fact that the expected insertion time is O(1) shows that the expected cost of n insertions in a Cuckoo hashing table is O(n). Thus, Cuckoo hashing offers a somewhat simpler alternative to the dynamic perfect hashing algorithms of Section 9.2.5.
Comments
Post a Comment