Succinct Representation of Data Structures:Permutations and Functions
Permutations are fundamental in computer science and have been the focus of extensive study. Here we consider the problem of representing permutations succinctly to support computing πk (i) for any integer k, where π0(i) = i for all i; πk (i) = π(πk−1 (i)) when k > 0 and πk (i) = π−1(πk+1(i)) when k < 0.
The most obvious way of representing an arbitrary permutation, π, of the integers {0, 1,... ,n − 1} is to store the sequence π(0), π(1),..., π(n − 1). This takes n llg nl bits, which is Θ(n) bits more than the information-theoretic lower bound of lg(n!) ≈ n lg n−n lg e bits. This representation can be used to find π(i) in O(1) time, but finding π−1(i) takes O(n) time in the worst case, for 0 ≤ i ≤ n − 1. Using this representation, one can easily compute πk (i) in k steps, for k ≥ 1. To facilitate the computation in constant time, one could store πk (i) for all i and k (|k|≤ n, along with its cycle length), but that would require Θ(n2 lg n) bits. The most natural compromise is to retain πk (i) with |k|≤ n a power of 2.
Unfortunately, this n llg nl2 bit representation leaves us with a logarithmic time evaluation scheme and a factor of lg n from the minimal space representation.
We first show how to augment the standard representation to support π−1 queries efficiently, while avoiding most of the extra storage cost one would expect. In addition to storing the standard representation, we trace the cycle structure of the permutation, and for every cycle whose length is at least t, we store a shortcut pointer with the elements which are at a distance of a multiple of t steps from an arbitrary starting point. The shortcut pointer points to the element which is t steps before it in the cycle of the permutation. This shortcut representation of a permutation can be stored using (1 + 1/t)n lg n + o(n) bits, and it supports π queries in O(1) time and π−1 queries in O(t) time, for any parameter 1 ≤ t ≤ n.
Consider the cycle representation of a permutation π over {0, 1,... ,n − 1}, which is a collection of disjoint cycles of π (where the cycles are ordered arbitrarily). Let σ be this permutation, i.e., the standard representation of σ is a cycle representation of π. Let B be a bit vector of length n that has a 1 corresponding to the starting position of each cycle of π and 0 everywhere else, together with its rank and select directories with respect to both bits. Let S be a representation of σ that supports σ(i) and σ−1(i) queries efficiently. Then to find πk (i), first find the index j of the cycle to which σ−1(i) belongs, using B and S.
Find the length l of the j-th cycle and the number p of elements up to (but not including) the j-th cycle. Then, one can verify that πk (i) = σ(p + (i − p + k mod l)). Combining this with the shortcut representation, one can get a representation taking (1 + E)n lg n + O(1) bits that supports computing arbitrary powers in O(1) time.
Benes network: A Benes network [36] is a communication network composed of a number of switches. Each switch has 2 inputs x0 and x1 and 2 outputs y0 and y1 and can be configured either so that x0 is connected to y0 (i.e. a packet that is input along x0 comes out of y0) and x1 is connected to y1, or the other way around. An r-Benes network has 2r inputs and outputs, and is defined as follows. For r = 1, the Benes network is a single switch with 2 inputs and 2 outputs. An (r + 1)-Benes network is composed of 2r+1 switches and two r-Benes networks, connected as as shown in Fig. 37.6(a).
A particular setting of the switches of a Benes network realizes a permutation π if a packet introduced at input I comes out at output π(i), for all i. See Fig. 37.6(b) for an example.
Clearly, a Benes network may be used to represent a permutation. For example, if n = 2r , a representation of a permutation π on [n] may be obtained by configuring an r-Benes network to realize π and then listing the settings of the switches in some canonical
order (e.g. level-order). This represents π using r2r − 2r−1 = n lg n − n/2 bits. Given i, one can trace the path taken by a packet at input i by inspecting the appropriate bits in this representation, and thereby calculate π(i) in O(lg n) time (indeed, in O(lg n) bit-probes).
In fact, by tracing the path back from output i we can also compute π−1(i) in O(lg n) time.
One can compress the middle levels of a Benes network by storing an implicit representation of the permutation represented by the middle O(lg n/ lg lg n) levels. This reduces the space to lg(n!) + o(n) bits. One can also group the remaining bits of this Benes network into words of size Θ(lg n) bits (by taking O(lg lg n) consecutive levels and O(lg lg n) appropriate rows). This enables us to traverse Θ(lg lg n) levels in a Benes network in O(1) time. Thus, it gives a representation that takes the optimal llg(n!)l + o(n) bits, and supports computing arbitrary powers in O(lg n/ lg lg n) time.
One can obtain a structure with same time and space bounds even when n is not a power of 2. See [43] for details.
Functions
Now consider the problem of representing arbitrary functions f : [n] → [n], so that queries for f k(i), for any integer k can be answered efficiently. Here f 0(i) = i and for any k > 0, f k(i) = f (f k−1(i)) and f −k (i) = {j|f k(j) = i}, for all i. This is a generalization of the problem considered in the previous section. Since there are nn functions from [n] to [n], any representation scheme takes at least ln lg nl bits to store an arbitrary function.
A standard way of representing a function is to store the sequence f (i), for i = 0,... , n−1.
This representation does not support the efficient evaluation of f k(i) for k >> 1. We look at a representation that takes (1 + E)n lg n + O(1) bits of space to store a function f : [n] → [n] and supports computing arbitrary positive powers in constant time and negative powers f −k(i), in O(1 + |f −k(i)|) time.
Given an arbitrary function f : [n] → [n], consider the directed graph, Gf = (V, E), obtained from it, where V = [n] and E = {(i, j) : f (i) = j}. In general this directed graph consists of a disjoint set of subgraphs, each of which is a directed cycle with trees rooted at the nodes on the cycle and edges directed towards the roots.
example.
See Figure 37.7 for an The main idea of the solution is as follows: in each directed cycle, we re-order the nodes
of each tree such that the leftmost path of any subtree is the longest path in that subtree. This enables finding a node at a given depth from any internal node, if it exists, in constant time using the parenthesis representation. We then preprocess each of the trees and store auxiliary structures to support level-ancestor queries on them in constant time (see Section 37.4.2). Observe that finding f k(i), for k > 0, can be translated to finding the ancestor of node i which is k levels above it, if i is at a depth at least k in its tree T . Otherwise, we have to traverse the cycle to which the root of T belongs, to find the required answer. This can be done by storing these cycles as a permutation.
When i belongs to one of the trees in a subgraph, one can answer f k (i) queries for k < 0 in optimal time by finding all the nodes that are at the k-th level in the subtree rooted at i.
Otherwise, if i is part of the cycle in the subgraph, we store an auxiliary structure that, for any given i and k, outputs all the trees in the subgraph containing i that have an answer in time proportional to the number of such nodes. From this, one can easily find the required answer in optimal time. The auxiliary structure takes O(m) bits for a subgraph with m nodes, and hence O(n) bits overall. See [42] for details.
For functions from [n] → [m] one can show the following: If there is a representation of a permutation that takes P (n) space to represent a permutation on [n] and supports forward in t1 time and inverse in t2 time, then there is a representation of a function from [n] to [m], m ≤ n that takes (n − m) lg m + P (m) + O(m) bits, and supports f k (i) in O(t1 + t2) time, for any positive integer k and for any i ∈ [n]. When m > n, larger powers are not defined in general. In this case, we can have a structure that takes n lg m + P (n) + O(n) bits of space and answers queries for positive powers (returns the power if defined or returns −1 otherwise) in O(t1 + t2) time.
Comments
Post a Comment