Graphs:Minimum Spanning Tree
Minimum Spanning Tree
How to connect a given set of points with lowest cost is a frequently-encountered problem, which can be modeled as the problem of finding a minimum-weight spanning tree T in a weighted, connected, undirected graph G = (V, E). Methods for finding such a spanning tree, called a minimum spanning tree (MST), have been investigated in numerous studies and have a long history [8]. In this section we will discuss the bare essentials of the two commonly used MST algorithms—Kruskal’s and Prim’s—and briefly mention a third one.
Kruskal’s MST Algorithm
An algorithm due to J. B. Kruskal, which employs the smallest-edge-first strategy, works as follows: First we sort all the edges in the given network by weight, in nondecreasing order. Then one by one the edges are examined in order, smallest to the largest. If an edge ei, upon examination, is found to form a cycle (when added to edges already selected) it is discarded. Otherwise, ei is selected to be included in the minimum spanning tree T . The construction stops when the required n − 1 edges have been selected or when all m edges have been examined. If the given network is disconnected, we would get a minimum spanning forest (instead of tree). More formally, Kruskal’s method may be stated as follows:
Although the algorithm just outlined is simple enough, we do need to work out some implementation details and select an appropriate data structure for achieving an efficient execution.
There are two crucial implementational details that we must consider in this algorithm. If we initially sort all m edges in the given network, we may be doing a lot of unnecessary work. All we really need is to be able to to determine the next smallest edge in the network at each iteration. Therefore, in practice, the edges are only partially sorted and kept as a heap with smallest edge at the root of a min heap. In a graph with m edges, the initial construction of the heap would require O(m) computational steps; and the next smallest edge from a heap can be obtained in O(log m) steps. With this improvement, the sorting cost is O(m + p log m), where p is the number of edges examined before an MST is constructed. Typically, p is much smaller than m.
The second crucial detail is how to maintain the edges selected (to be included in the MST) so far, such that the next edge to be examined can be efficiently tested for a cycle formation.
As edges are examined and included in T , a forest of disconnected trees (i.e., subtrees of the final spanning tree) is produced. The edge e being examined will form a cycle if and only if both its end vertices belong to the same subtree in T . Thus to ensure that the edge currently being examined does not form a cycle, it is sufficient to check if it connects two different subtrees in T . An efficient way to accomplish this is to group the n vertices of the given network into disjoint subsets defined by the subtrees (formed by the edges included in T so far). Thus if we maintain the partially constructed MST by means of subsets of vertices, we can add a new edge by forming the UNION of two relevant subsets, and we can check for cycle formation by FINDing if the two end vertices of the edge, being examined, are in the same subset. These subsets can themselves be kept as rooted trees. The root is an element of the subset and is used as a name to identify that subset. The FIND subprocedure is called twice—once for each end vertex of edge e—to determine the sets to which the two end vertices belong. If they are different, the UNION subprocedure will merge the two subsets. (If they are the same subset, edge e will be discarded.)
The subsets, kept as rooted trees, are implemented by keeping an array of parent pointers for each of the n elements. Parent of a root, of course, is null. (In fact, it is useful to assign parent[root] = -number of vertices in the tree.) While taking the UNION of two subsets, we merge the smaller subset into the larger one by pointing the parent pointer in the root of the smaller subset to the root of the larger subset. Some of these details are shown in Figure 4.15. Note that r1 and r2 are the roots identifying the sets to which vertices u and v belong.
When algorithm in Figure 4.15 is applied to the weighted graph in Figure 4.16, the order in which edges are included one by one to form the MST are (3, 5), (4, 6), (4, 5), (4, 2), (6, 7), (3, 1). After the first five smallest edges are included in the MST, the 6th and 7th and 8th smallest edges are rejected. Then the 9th smallest edge (1, 3) completes the MST and the last two edges are ignored.
Prim’s MST Algorithm
A second algorithm, discovered independently by several people (Jarnik in 1936, Prim in 1957, Dijkstra in 1959) employs the “nearest neighbor ” strategy and is commonly referred to as Prim’s algorithm. In this method one starts with an arbitrary vertex s and joins it to its nearest neighbor, say y. That is, of all edges incident on vertex s, edge (s, y), with the smallest weight, is made part of the MST. Next, of all the edges incident on s or y we choose one with minimum weight that leads to some third vertex, and make this edge part of the MST. We continue this process of “reaching out” from the partially constructed tree (so far) and bringing in the “nearest neighbor” until all vertices reachable from s have been incorporated into the tree.
As an example, let us use this method to find the minimum spanning tree of the weighted graph given in Figure 4.16. Suppose that we start at vertex 1. The nearest neighbor of vertex 1 is vertex 3. Therefore, edge (1, 3) becomes part of the MST. Next, of all the edges incident on vertices 1 and 3 (and not included in the MST so far) we select the smallest, which is edge (3, 5) with weight 14. Now the partially constructed tree consists of two edges (1, 3) and (3, 5). Among all edges incident at vertices 1,3, and 5, edge (5, 4) is the smallest, and is therefore included in the MST. The situation at this point is shown in Figure 4.17. Clearly, (4, 6), with weight 18 is the next edge to be included. Finally, edges (4, 2) and (6, 7) will complete the desired MST.
The primary computational task in this algorithm is that of finding the next edge to be included into the MST in each iteration. For each efficient execution of this task we will
maintain an array near[u] for each vertex u not yet in the tree (i.e., u ∈ V − VT ). near[u]
is that vertex in VT which is closest to u. (Note that V is the set of all vertices in the network and VT is the subset of V included in MST thus far.) Initially, we set near[s] ← 0 to indicate that s is in the tree, and for every other vertex v, near[v] ← s.
For convenience, we will maintain another array dist[u] of the actual distance (i.e., edge weight) to that vertex in VT which is closest to u. In order to determine which vertex is to be added to the set VT next, we compare all nonzero values in dist array and pick the smallest. Thus n − i comparisons are sufficient to identify the ith vertex to be added.
Initially, since s is the only vertex in VT , dist[u] is set to wsu. As the algorithm proceeds, these two arrays are updated in each iteration (see Figure 4.17 for an illustration).
A formal description of the nearest-neighbor algorithm is given in Figure 4.18. It is as- sumed that the input is given in the form of an n× n weight matrix W (in which nonexistent edges have ∞ weights). Set V = {1, 2,... , n} is the set of vertices of the graph. VT and ET are the sets of vertices and edges of the partially formed (minimum spanning) tree. Vertex set VT is identified by zero entries in array near.
There is yet a third method for computing a minimum spanning tree, which was first proposed by O. Boruvka in 1926 (but rediscovered by G. Chouqet in 1938 and G. Sollin in 1961). It works as follows: First, the smallest edge incident on each vertex is found; these edges form part of the minimum spanning tree. There are at least In/2l such edges. The connected components formed by these edges are collapsed into “supernodes”. (There are no more than ln/2J such vertices at this point.) The process is repeated on “supernodes”
and then on the resulting “supersupernodes,” and so on, until only a single vertex remains. This will require at most llog2 nJ steps, because at each step the number of vertices is reduced at least by a factor of 2. Because of its inherent parallelism the nearest-neighbor- from-each-vertex approach is particularly appealing for parallel implementations.
These three “greedy” algorithms and their variations have been implemented with different data structures and their relative performance—both theoretical as well as empirical— have been studied widely. The results of some of these studies can be found in [2, 13, 14, 16].
In many applications, the minimum spanning tree is required to satisfy an additional constraint, such as (i) the degree of each vertex in the MST should be equal to or less than a specified value; or (ii) the diameter of the MST should not exceed a specified value; or (iii) the MST must have at least a specified number of leaves (vertices of degree 1 in a tree); and the like. The problem of computing such a constrained minimum spanning tree is usually NP-complete. For a discussion of various constrained MST problems and some heuristics solving them see [6].
Comments
Post a Comment