Hash Tables:Other Developments

Other Developments

The study of hash tables has a long history and many researchers have proposed methods of implementing hash tables. Because of this, the current chapter is necessarily incomplete. (At the time of writing, the hash.bib bibliography on hashing contains over 800 entries.) We have summarized only a handful of the major results on hash tables in internal memory. In this section we provide a few references to the literature for some of the other results. For more information on hashing, Knuth [37], Vitter and Flajolet [63], Vitter and Chen [62], and Gonnet and Baeza-Yates [30] are useful references.

Brent’s method (Section 9.3.6) is a collision resolution strategy for open addressing that reduces the expected search time for a successful search in a hash table with open addressing. Several other methods exist that either reduce the expected or worst-case search time. These include binary tree hashing [31, 45], optimal hashing [31, 54, 55], Robin-Hood hashing (Section 9.3.10), and min-max hashing [9, 29]. One interesting method, due to Celis [9], applies to any open addressing scheme. The idea is to study the distribution of the ages of elements in the hash table, i.e., the distribution give by

image

and start searching for x at the locations at which we are most likely to find it, rather than searching the table positions x0, x1, x2 ... in order.

Perfect hash functions seem to have been first studied by Sprugnoli [58] who gave some heuristic number theoretic constructions of minimal perfect hash functions for small data sets. Sprugnoli is responsible for the terms “perfect hash function” and “minimal perfect hash function.” A number of other researchers have presented algorithms for discovering minimal and near-minimal perfect hash functions. Examples include Anderson and Ander- son [2], Cichelli [14, 15], Chang [11, 12], Gori and Soda [32], and Sager [56]. Berman et al. [4] and Ko¨rner and Marton [40] discuss the theoretical limitations of perfect hash functions. A comprehensive, and recent, survey of perfect hashing and minimal perfect hashing is given by Czech et al. [17].

Tarjan and Yao [60] describe a set ADT implementation that gives O(log u/ log n) worst- case access time.

It is obtained by combining a trie (Chapter 28) of degree n with a compression scheme for arrays of size n2 that contain only n non-zero elements. (The trie has O(n) nodes each of which has n pointers to children, but there are only a total of O(n) children.) Although their result is superseded by the results of Fredman et al. [28]

discussed in Section 9.2.4, they are the first theoretical results on worst-case search time for hash tables.

Dynamic perfect hashing (Section 9.2.5) and cuckoo hashing (Section 9.3.11) are methods of achieving O(1) worst case search time in a dynamic setting. Several other methods have been proposed [6, 21, 22].

Yao [65] studies the membership problem. Given a set S U , devise a data structure that can determine for any x U whether x is contained in S. Yao shows how, under various conditions, this problem can be solved using a very small number of memory accesses per query. However, Yao’s algorithms sometimes derive the fact that an element x is in S without actually finding x. Thus, they don’t solve the set ADT problem discussed at the beginning of this chapter since they can not recover a pointer to x.

The “power of two random choices,” as used in multiple-choice hashing, (Section 9.3.7) has many applications in computer science. Karp, Luby and Meyer auf der Heide [34, 35] were the first to use this paradigm for simulating PRAM computers on computers with fewer processors. The book chapter by Mitzenmacher et al. [46] surveys results and applications of this technique.

A number of table implementations have been proposed that are suitable for managing hash tables in external memory. Here, the goal is to reduce the number of disk blocks that must be accessed during an operation, where a disk block can typically hold a large number of elements. These schemes include linear hashing [43], dynamic hashing [41], virtual hashing [42], extendible hashing [26], cascade hashing [36], and spiral storage [48]. In terms of hashing, the main difference between internal memory and external memory is that, in internal memory, an array is allocated at a specific size and this can not be changed later. In contrast, an external memory file may be appended to or be truncated to increase or decrease its size, respectively. Thus, hash table implementations for external memory can avoid the periodic global rebuilding operations used in internal memory hash table implementations.

Comments

Popular posts from this blog

0/1 Knapsack Problem Memory function.

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Drawing Trees:HV-Layout