Randomized Graph Data-Structures for Approximate Shortest Paths:Introduction and A Randomized Data-Structure for Static APASP,Approximate Distance Oracles
Introduction
Let G = (V, E) be an undirected weighted graph on n = |V | vertices and m = |E| edges. Length of a path between two vertices is the sum of the weights of all the edges of the path. The shortest path between a pair of vertices is the path of least length among all possible paths between the two vertices in the graph. The length of the shortest path between two vertices is also called the distance between the two vertices. An α-approximate shortest path between two vertices is a path of length at-most α times the length of the shortest path.
Computing all-pairs exact or approximate distances in G is one of the most fundamental graph algorithmic problem. In this chapter, we present two randomized graph data- structures for all-pairs approximate shortest paths (APASP) problem in static and dynamic environments. Both the data-structures are hierarchical data-structures and their construction involves random sampling of vertices or edges of the given graph.
The first data-structure is a randomized data-structure designed for efficiently computing APASP in a given static graph. In order to answer a distance query in constant time, most of the existing algorithms for APASP problem output a data-structure which is an n × n matrix that stores the exact/approximate distance between each pair of vertices explicitly. Recently a remarkable data-structure of o(n2) size has been designed for reporting all- pairs approximate distances in undirected graph. This data-structure is called approximate distance oracle because of its ability to answer a distance query in constant time in spite of its sub-quadratic size. We present the details of this novel data-structure and an efficient algorithm to build it.
The second data-structure is a dynamic data-structure designed for efficiently maintaining APASP in a graph that is undergoing deletion of edges. For a given graph G = (V, E) and a distance parameter d ≤ n, this data-structure provides the first o(nd) update time algorithm for maintaining α-approximate shortest paths for all pairs of vertices separated by distance ≤ d in the graph.
A Randomized Data-Structure for Static APASP : Ap- proximate Distance Oracles
There exist classical algorithms that require O(mn log n) time for solving all-pairs shortest paths (APSP) problem. There also exist algorithms based on fast matrix multiplication that achieve sub-cubic time. However, there is still no combinatorial algorithm that could achieve O(n3−E ) running time for APSP problem. In recent past, many simple combinatorial algorithms have been designed that compute all-pairs approximate shortest paths (APASP) for undirected graphs. These algorithms achieve significant improvement in the running time compared to those designed for APSP, but the distance reported has some additive or/and multiplicative error. An algorithm is said to compute all pairs α-approximate shortest paths, if for each pair of vertices u, v ∈ V , the distance reported is bounded by αδ(u, v), where δ(u, v) denotes the actual distance between u and v.
Among all the data-structures and algorithms designed for computing all-pairs approximate shortest paths, the approximate distance oracles are unique in the sense that they achieves simultaneous improvement in running time (sub-cubic) as well as space (sub quadratic), and still answers any approximate distance query in constant time. For any k ≥ 1, it takes O(kmn1/k ) time to compute (2k − 1)-approximate distance oracle of size O(kn1+1/k ) that would answer any (2k − 1)-approximate distance query in O(k) time.
3-Approximate Distance Oracle
For a given undirected graph, storing distance information from each vertex to all the vertices requires θ(n2) space. To achieve sub-quadratic space, the following simple idea comes to mind.
From each vertex, if we store distance information to a small number of vertices, can I : we still be able to report distance between any pair of vertices ?
The above idea can indeed be realized using a simple random sampling technique, but at the expense of reporting approximate, instead of exact, distance as an answer to a distance query. We describe the construction of 3-approximate distance oracle as follows.
1. Let R ⊂ V be a subset of vertices formed by picking each vertex randomly independently with probability γ (the value of γ will be fixed later on).
2. For each vertex u ∈ V , store the distances to all the vertices of the sample set R.
3. For each vertex u ∈ V , let p(u) be the vertex nearest to u among all the sampled
vertices, and let Su be the set of all the vertices of the graph G that lie closer to u than the vertex p(u). Store the vertices of set Su along with their distance from u.
For each vertex u ∈ V , storing distance to vertices Su helps in answering distance query to vertices in locality of u, whereas storing distance from all the vertices of the graph to all
the sampled vertices will be required (as shown below) to answer distance query for vertices that are not present in locality of each other. In order to extract distance information in constant time, for each vertex u ∈ V , we use two hash tables (see [4], chapter 12), for storing distances from u to vertices of sets Su and R respectively. The size of each hash-table is of the order of the size of corresponding set (Su or R). A typical hash table would require O(1) expected time to determine whether w ∈ Su, and if so, report the distance δ(u, w). In order to achieve O(1) worst case time, the 2 − level hash table (see Fredman, Komlos, Szemeredi, [8]) of optimal size is employed.
The collection of these hash-tables (two tables per vertex) constitute a data-structure that we call approximate distance oracle. Let u, v ∈ V be any two vertices whose intermediate distance is to be determined approximately. If either u or v belong to set R, we can report exact distance between the two. Otherwise also exact distance δ(u, v) will be reported if v lies in Su or vice versa. The only case, that is left, is when neither v ∈ Su nor u ∈ Sv . In this case, we report δ(u, p(u)) + δ(v, p(u)) as approximate distance between u and v. This distance is bounded by 3δ(u, v) as shown below.
Hence distance reported by the approximate distance oracle described above is no more than three times the actual distance between the two vertices. In other words, the oracle is a 3-approximate distance oracle. Now, we shall bound the expected size of the oracle. Using linearity of expectation, the expected size of the sample set R is nγ. Hence storing the distance from each vertex to all the vertices of sample set will take a total of O(n2γ) space. The following lemma gives a bound on the expected size of the sets Su,u ∈ V .
LEMMA 38.1 Given a graph G = (V, E), let R ⊂ V be a set formed by picking each vertex independently with probability γ. For a vertex u ∈ V , the expected number of vertices in the set Su is bounded by 1/γ.
Proof Let {v1, v2, ··· , vn−1} be the sequence of vertices of set V \{u} arranged in non- decreasing order of their distance from u. The set Su consists of all those vertices of the set V \{u} that lie closer to u than any vertex of set R. Note that the vertex vi belongs to Su if none of the vertices of set {v1, v2, ··· , vi−1} (i.e., the vertices preceding vi in the sequence above) is picked in the sample R. Since each vertex is picked independently with probability p, therefore the probability that vertex vi belongs to set Su is (1 − γ)i−1. Using linearity of expectation, the expected number of vertices lying closer to u than any sampled vertex is
In the previous subsection, 3-approximate distance oracle was presented based on the idea I. The (2k − 1)-approximate distance oracle is a k-level hierarchical data-structure. An important construct of the data-structure is Ball(·) defined as follows.
DEFINITION 38.1 For a vertex u, and subsets X, Y ⊂ V , the set Ball(u, X, Y ) is the set consisting of all those vertices of the set X that lie closer to u than any vertex from set Y . (see Figure 38.2)
Ball(u, X, X ) = ∅. It can also be seen that the 3-approximate distance oracle described in the previous subsection stores Ball(u, V, R) and Ball(u, R, ∅) for each vertex u ∈ V .
If the set Y is formed by picking each vertex of set X independently with probability γ, it follows from Lemma 38.1 that the expected size of Ball(u, X, Y ) is bounded by 1/γ.
LEMMA 38.2 Let G = (V, E) be a weighted graph, and X ⊂ V be a set of vertices. If Y ⊂ X is formed by selecting each vertex independently with probability γ, the expected number of vertices in Ball(u, X, Y ) for any vertex u ∈ V is at-most 1/γ.
(2k − 1)-Approximate Distance Oracle
In this subsection we shall give the construction of a (2k − 1)-approximate distance oracle which is also based on the idea I, and can be viewed as a generalization of 3-approximate distance oracle.
The collection of the hash-tables Balli(u) : u ∈ V, i ≤ k constitute the data-structure that will facilitate answering of any approximate distance query in constant time. To provide a better insight into the data-structure, Figure 38.3 depicts the set of vertices constituting {Balli(u)|i ≤ k}.
Reporting distance with stretch at-most (2k − 1)
Given any two vertices u, v ∈ V whose intermediate distance has to be determined approximately. We shall now present the procedure to find approximate distance between the two vertices using the k-level data-structure described above.
Let p1(u) = u and let pi(u),i > 1 be the vertex from the set i nearest to u. Since pi(u) ∈ Balli(u) for each u ∈ V , so distance from each u to pi(u) is known for each i ≤ k.
The query answering process performs at-most k search steps. In the first step, we search Ball1(u) for the vertex p1(v). If p1(v) is not present in Ball1(u), we move to the next level and in the second step we search Ball2(v) for vertex p2(u). We proceed in this way querying balls of u and v alternatively : In ith step, we search Balli(x) for pi(y), where (x = u, y = v) if i is odd, and (x = v, y = u) otherwise. The search ends at ith step if pi(y) belongs to Balli(x), and then we report δ(x, pi(y)) + δ(y, pi(y) as an approximate distance between u and v.
For the base case (i = 0), p1(y) is same as y, and y is v. So δ(y, p1(y)) = 0. Hence A0 is true. For the rest of the inductive proof, it suffices to show that if Aj is true, then after (j + 1)th iteration Aj+1 is also true. The proof is as follows.
We consider the case of “even j”, the arguments for the case of ’odd j’ are similar. For even j, at the end of jth iteration, {x = u, y = v}, Thus Aj implies that at the end of jth iteration δ(v, pj+1 (v)) ≤ jδ(u, v). Consider the (j + 1)th iteration. For the execution of (j + 1)th iteration, the condition in the ’while-loop’ must have been true. Thus pj+1(v) does not belong to Ball(u, Rk , Rk ). Hence by Definition 38.1, the vertex pj+2(u) must be lying closer to u than the vertex pj+1(v). So at the end of (j + 1)th iteration, δ(y, pj+2(y))
THEOREM 38.1 The algorithm Distance Report(u, v) reports (2k − 1)-approximate dis- tance between u and v Proof As an approximate distance between u and v, note that the algorithm Distance- Report(u, v) would output δ(y, pl(y)) + δ(x, pl (y)), which by triangle inequality is no more than 2δ(y, pl(y)) + δ(x, y). Since δ(x, y) = δ(u, v), and δ(y, pl (y)) ≤ (l − 1)δ(u, v) as follows from assertion Al . Therefore, the distance reported is no more than (2l − 1)δ(u, v). Since the “while loop” will execute at-most k − 1 iterations, so l = k, and therefore the distance reported by the oracle is at-most (2k − 1)δ(u, v).
Size of the (2k − 1)-approximate distance oracle
The expected size of the set Rk is O(n1/k ), and the expected size of each Balli(u) is n1/k using Lemma 38.2. So the expected size of the (2k − 1)-approximate distance oracle is O(n1/k · n + (k − 1) · n · n1/k ) = O(kn1+1/k ).
Computing Approximate Distance Oracles
In this subsection, a sub-cubic running time algorithm is presented for computing (2k − 1)- approximate distance oracles. It follows from the description of the data-structure associated with approximate distance oracle that after forming the sampled sets of vertices i , that takes O(m) time, all that is required is the computation of Balli(u) along with the distance from u to the vertices belonging to these balls for each u and i ≤ k.
Since Balli(u) is the set of all the vertices of set Ri that lie closer to u than the vertex pi+1(u). So, in order to compute Balli(u), first we compute pi(u) for all u ∈ V, i ≤ k.
The above problem can be solved by running a single source shortest path algorithm (Dijkstra’s algorithm) on a modified graph as follows. Modify the original graph G by adding a dummy vertex s to the set V , and joining it to each vertex of the set X by an edge of zero weight. Let Gl be the modified graph. Running Dijkstra’s algorithm from the vertex s as the source, it can be seen that the distance from s to a vertex y ∈ Y is indeed the distance from y to the nearest vertex of set X. Moreover, if e(s, x),x ∈ X is the edge leading to the shortest path from s to y, then x is the vertex from the set X that lies nearest to y. The running time of the Dijkstra’s algorithm is O(m log n), we can thus state the following lemma.
From Lemma 38.5, it follows that the graph induced by the vertices of the cluster C(v, Rk ) is connected (hence the name cluster). Moreover, the entire cluster C(v, Rk )
appears as a sub-tree of the shortest path tree rooted at v in the graph. As follows from the definition, for each vertex x ∈ C(v, i+1 ), δ(v, x) < δ(x, pi+1(x)). Based on these two observations, here follows an efficient algorithm that computes the set C(v, i+1 ). The algorithm performs a restricted Dijkstra’s algorithm from the vertex v, wherein we don’t proceed along any vertex that does not belong to the set C(v, i+1).
A restricted Dijkstra’s algorithm : Note that the Dijkstra’s algorithm starts with singleton tree {v} and performs n − 1 steps to grow the complete shortest path tree. Each vertex x ∈ V \{v} is assigned a label L(x), which is infinity in the beginning, but eventually becomes the distance from v to x. Let Vi denotes the set of i nearest vertices from v. The algorithm maintains the following invariant at the end of lth step :
I(l) : For all the vertices of the set Vl , the label L(x) = δ(v, x), and for every other vertex y ∈ V \Vl, the label L(y) is equal to the length of the shortest path from v to y that passes through vertices of Vl only.
During the (j + 1)th step, we select the vertex, say w from set V − Vj with least value of L(·). Since all the edge weights are positive, it follows from the invariant I(j) that L(w) = δ(w, v). Thus we add w to set Vj to get the set Vj+1 . Now in order to satisfy the invariant I(j + 1), we relax each edge e(w, y) incident from w to a vertex y ∈ V − Vj+1 as follows : L(y) ← min{L(y), L(w) + weight(w, y)}. It is easy to observe that this ensures the validity of the invariant I(j + 1).
In the restricted Dijkstra’s algorithm, we will put the following restriction on relaxation of an edge e(w, y) : we relax the edge e(w, y) only if L(w)+weight(w, y) is less than δ(y, pi(y)).
This will ensure that a vertex y ∈/ C(v, i+1) will never be visited during the algorithm.
The fact that the vertices of the cluster C(v, i+1) form a sub-tree of the shortest path tree rooted at v, ensures that the above restricted Dijkstra’s algorithm indeed finds all the vertices (along with their distance from v) that form the cluster C(v, i+1 ). Since the running time of Dijkstra’s algorithm is dominated by the number of edges relaxed, and each edge relaxation takes log(n) time only, therefore, the restricted Dijkstra’s algorithm will run in time of the order of degree(x) log n. Thus the total time for computing all
Using the above result and Lemma 38.4, we can thus conclude that for a given weighted graph G = (V, E) and an integer k, it takes a total of O˜(kmn1/k log n) time for computing {Balli(u)|i < k, u ∈ V }. If we use Fibonacci heaps instead of binary heaps in implementation of the restricted Dijkstra’s algorithm, we can get rid of the logarithmic factor in the running time. Hence the total expected running time for building the data- structure is O(kmn1/k ). As mentioned before, the expected size of the data-structure will be O(kn1+1/k ). To get O(kn1+1/k ) bound on the worst case size of the data-structure, we repeat the preprocessing algorithm. The expected number of iterations will be just a constant. Hence, we can state the following theorem.
THEOREM 38.2 Given a weighted undirected graph G = (V, E) and an integer k, a data-structure of size O(kn1+1/k ) can be built in O(kmn1/k ) expected time so that given any pair of vertices, (2k − 1)-approximate distance between them can be reported in O(k) time.
Comments
Post a Comment