Suffix Trees and Suffix Arrays:Linear Time Construction Algorithm

Linear Time Construction Algorithms

In this section, we explore linear time construction algorithms for suﬃx trees and suﬃx arrays. We also show how suﬃx trees and suﬃx arrays can be derived from each other in linear time. In suﬃx tree and suﬃx array construction algorithms, three diﬀerent types of alphabets are considered: a constant or ﬁxed size alphabet (|Σ| = O(1)), integer alphabet (Σ = {1, 2,... , n}), and arbitrary alphabet. Suﬃx trees and suﬃx arrays can be constructed in linear time for both constant size and integer alphabets. The constant alphabet size case covers many interesting application areas, such as English text, or DNA or protein sequences in molecular biology. The integer alphabet case is interesting because a string of length n can have at most n distinct characters. Furthermore, some algorithms use a recursive technique that would generate and require operating on strings over integer alphabet, even when applied to strings over a ﬁxed alphabet.

Suﬃx Trees vs. Suﬃx Arrays

We ﬁrst show that the suﬃx array and Lcp array of a string can be obtained from its suﬃx tree in linear time. Deﬁne lexicographic ordering of the children of a node to be the order based on the ﬁrst character of the edge labels connecting the node to its children. Deﬁne lexicographic depth ﬁrst search to be a depth ﬁrst search of the tree where the children of each node are visited in lexicographic order. The order in which the leaves of a suﬃx tree are visited in a lexicographic depth ﬁrst search gives the suﬃx array of the corresponding string. In order to obtain lcp information, the string-depth of the current node during the search is remembered. This can be easily updated in O(1) time per edge as the search progresses. The length of the lcp between two consecutive suﬃxes is given by the smallest string-depth of a node visited between the two suﬃxes.

Given the suﬃx array and the Lcp array of a string s (|s$| = n), its suﬃx tree can be constructed in O(n) time. This is done by starting with a partial suﬃx tree for the lexicographically smallest suﬃx, and repeatedly inserting subsequent suﬃxes in the suﬃx array into the tree until the suﬃx tree is complete. Let Ti denote the compacted trie of the ﬁrst i suﬃxes in lexicographic order. The ﬁrst tree T1 consists of a single leaf labeled SA[1] connected to the root with an edge labeled suf fSA[1] = $.

To insert SA[i + 1] into Ti, start with the most recently inserted leaf SA[i] and walk up (|suf fSA[i]|− |lcp(suf fSA[i], suf fSA[i+1])|) = ((n − SA[i]+ 1) − Lcp[i]) characters along the path to the root. This walk can be done in O(1) time per edge by calculating the lengths of the respective edge labels. If the walk does not end at an internal node, create an internal node. Create a new leaf labeled SA[i + 1] and connect it to this internal node with an edge. Set the label on this edge to s[SA[i + 1] + Lcp[i]..n]. This creates the tree Ti+1. The procedure works because suf fSA[i+1] shares a longer preﬁx with suf fSA[i] than any other suﬃx inserted so far. To see that the entire algorithm runs in O(n) time, note that inserting a new suﬃx into Ti requires walking up the rightmost path in Ti. Each edge that is traversed ceases to be on the rightmost path in Ti+1, and thus is never traversed again. An edge in an intermediate tree Ti corresponds to a path in the suﬃx tree ST . When a new internal node is created along an edge in an intermediate tree, the edge is split into two edges, and the edge below the newly created internal node corresponds to an edge in the suﬃx tree. Once again, this edge ceases to be on the rightmost path and is never traversed again. The cost of creating an edge in an intermediate tree can be charged to the lowest edge on the corresponding path in the suﬃx tree. As each edge is charged once for creating and once for traversing, the total run-time of this procedure is O(n).

Finally, the Lcp array itself can be constructed from the suﬃx array and the string in linear time [14]. Let R be an array of size n such that R[i] contains the position in SA of suf fi. R can be constructed by a linear scan of SA in O(n) time. The Lcp array is computed in n iterations. In iteration i of the algorithm, the longest common preﬁx between suf fi and its respective right neighbor in the suﬃx array is computed. The array R facilitates locating an arbitrary suﬃx suf fi and ﬁnding its right neighbor in the suﬃx array in constant time. Initially, the length of the longest common preﬁx between suf f1 and its suﬃx array neighbor is computed directly and recorded. Let suf fj be the right neighbor of suf fi in SA. Let l be the length of the longest common preﬁx between them.

Suppose l ≥ 1. As suf fj is lexicographically greater than suf fi and s[i] = s[j], suf fj+1 is lexicographically greater than suf fi+1. The length of the longest common preﬁx between them is l − 1. It follows that the length of the longest common preﬁx between suf fi+1 and its right neighbor in the suﬃx array is ≥ l − 1. To determine its correct length, the comparisons need only start from the lth characters of the suﬃxes.

To prove that the run time of the above algorithm is linear, charge a comparison between the rth character in suﬃx suf fi and the corresponding character in its right neighbor suﬃx in SA to the position in the string of the rth character of suf fi, i.e., i + r − 1.

A comparison made in an iteration is termed successful if the characters compared are identical, contributing to the longest common preﬁx being computed. Because there is one failed comparison in each iteration, the total number of failed comparisons is O(n). As for successful comparisons, each position in the string is charged only once for a successful comparison. Thus, the total number of comparisons over all iterations is linear in n.

In light of the above discussion, a suﬃx tree and a suﬃx array can be constructed from each other in linear time. Thus, a linear time construction algorithm for one can be used to construct the other in linear time. In the following subsections, we explore such algorithms. Each algorithm is interesting in its own right, and exploits interesting properties that could be useful in designing algorithms using suﬃx trees and suﬃx arrays.

Linear Time Construction of Suﬃx Trees

Let s be a string of length n including the termination character $. Suﬃx tree construction algorithms start with an empty tree and iteratively insert suﬃxes while maintaining the property that each intermediate tree represents a compacted trie of the suﬃxes inserted so far. When all the suﬃxes are inserted, the resulting tree will be the suﬃx tree. Suﬃx links are typically used to speedup the insertion of suﬃxes. While the algorithms are identiﬁed by the names of their respective inventors, the exposition presented does not necessarily follow the original algorithms and we take the liberty to comprehensively present the material in a way we feel contributes to ease of understanding.

Mc Creight’s Algorithm

McCreight’s algorithm inserts suﬃxes in the order suf f1, suf f2,..., suf fn. Let Ti denote the compacted trie after suf fi is inserted. T1 is the tree consisting of a single leaf labeled 1 that is connected to the root by an edge with label s[1..n]. In iteration i of the algorithm, suf fi is inserted into tree Ti−1 to form tree Ti. An easy way to do this is by starting from the root and following the unique path matching characters in suf fi one by one until no more matches are possible. If the traversal does not end at an internal node, create an internal node there. Then, attach a leaf labeled i to this internal node and use the unmatched portion of suf fi for the edge label. The run-time for inserting suf fi is proportional to |suf fi| = n − i + 1. The total run-time of the algorithm is Σn (n − i + 1) = O(n2).

In order to achieve an O(n) run-time, suﬃx links are used to signiﬁcantly speedup the insertion of a new suﬃx. Suﬃx links are useful in the following way − Suppose we are inserting suf fi in Ti−1 and let v be an internal node in Ti−1 on the path from root to leaf labeled (i − 1). Then, path-label(v) = cα is a preﬁx of suf fi−1. Since v is an internal node, there must be another suﬃx suf fj (j < i − 1) that also has cα as preﬁx. Because suf fj+1 is previously inserted, there is already a path from the root in Ti−1 labeled α. To insert suf fi faster, if the end of path labeled α is quickly found, comparison of characters in suf fi can start beyond the preﬁx α. This is where suﬃx links will be useful. The algorithm must also construct suﬃx links prior to using them.

LEMMA 29.1 Let v be an internal node in ST (s) that is created in iteration i − 1. Let path-label(v) = cα, where c is a character and α is a (possibly empty) string. Then, either there exists an internal node u with path-label(u) = α or it will be created in iteration i.

Proof As v is created when inserting suf fi−1 in Ti−2, there exists another suﬃx suf fj (j < i − 1) such that lcp(suf fi−1, suf fj ) = cα. It follows that lcp(suf fi, suf fj+1) = α.

The tree Ti already contains suf fj+1. When suf fi is inserted during iteration i, internal node u with path-label α is created if it does not already exist.

The above lemma establishes that the suﬃx link of a newly created internal node can be established in the next iteration.

The following procedure is used when inserting suf fi in Ti−1. Let v be the internal node to which suf fi−1 is attached as a leaf. If v is the root, insert suf fi using character comparisons starting with the ﬁrst character of suf fi. Otherwise, let path-label(v) = cα. If v has a suﬃx link from it, follow it to internal node u with path-label α. This allows skipping the comparison of the ﬁrst |α| characters of suf fi. If v is newly created in iteration i−1, it would not have a suﬃx link yet. In that case, walk up to parent vf of v. Let β denote the label of the edge connecting vf and v. Let uf = SL(vf) unless vf is the root, in which case let uf be the root itself. It follows that path-label(uf) is a preﬁx of suf fi. Furthermore, it is guaranteed that there is a path below uf that matches the next |β| characters of suf fi. Traverse |β| characters along this path and either ﬁnd an internal node u or insert an internal node u if one does not already exist. In either case, set SL(v) = u. Continue by starting character comparisons skipping the ﬁrst |α| characters of suf fi.

The above procedure requires two diﬀerent types of traversals − one in which it is known that there exists a path below that matches the next |β| characters of suf fi (type I), and the other in which it is unknown how many subsequent characters of suf fi match a path below (type II). In the latter case, the comparison must proceed character by character until a mismatch occurs. In the former case, however, the traversal can be done by spending only O(1) time per edge irrespective of the length of the edge label. At an internal node during such a traversal, the decision of which edge to follow next is made by comparing the next character of suf fi with the ﬁrst characters of the edge labels connecting the node to its children. However, once the edge is selected, the entire label or the remaining length of β must match, whichever is shorter. Thus, the traversal can be done in constant time per edge, and if the traversal stops within an edge label, the stopping position can also be determined in constant time.

The insertion procedure during iteration i can now be described as follows: Start with the internal node v to which suf fi−1 is attached as a leaf. If v has a suﬃx link, follow it and perform a type II traversal to insert suf fi. Otherwise, walk up to v’s parent, take the suﬃx link from it unless it is the root, and perform a type I traversal to either ﬁnd or create the node u which will be linked from v by a suﬃx link. Continue with a type II traversal below u to insert suf fi.

LEMMA 29.2 The total time spent in type I traversals over all iterations is O(n).

Proof A type I traversal is performed by walking down along a path from root to a leaf in O(1) time per edge. Each iteration consists of walking up at most one edge, following a suﬃx link, and then performing downward traversals (either type II or both type I and type II). Recall that if SL(v) = u, then tree-depth(u) ≥ tree-depth(v) − 1. Thus, following a suﬃx link may reduce the depth in the tree by at most one. It follows that the operations that may cause moving to a higher level in the tree cause a decrease in depth of at most 2 per iteration. As both type I and type II traversals increase the depth in the tree and there are at most n levels in ST , the total number of edges traversed by type I traversals over all the iterations is bounded by 3n.

LEMMA 29.3 The total time spent in type II traversals over all iterations is O(n).

Proof In a type II traversal, a suﬃx of the string suf fi is matched along a path in Ti−1until there is a mismatch. When a mismatch occurs, an internal node is created if there does not exist one already. Then, the remaining part of suf fi becomes the edge label connecting leaf labeled i to the internal node. Charge each successful comparison of a character in suf fi to the corresponding character in the original string s. Note that a character that is charged with a successful comparison is never charged again as part of a type II traversal. Thus, the total time spent in type II traversals is O(n).

The above lemmas prove that the total run-time of McCreight’s algorithm is O(n). McCreight’s algorithm is suitable for constant sized alphabets. The dependence of the run-time and space for storing suﬃx trees on the size of the alphabet |Σ| is as follows: A simple way to allocate space for internal nodes in a suﬃx tree is to allocate |Σ| + 1 pointers for children,one for each distinct character with which an edge label may begin. With this approach, the edge label beginning with a given character, or whether an edge label exists with a given character, can be determined in O(log |Σ|) time. However, as all |Σ| + 1 pointers are kept irrespective of how many children actually exist, the total space is O(|Σ|n). If the tree is stored such that each internal node points only to its leftmost child and each node also points to its next sibling, if any, the space can be reduced to O(n), irrespective of |Σ|. With this, searching for a child connected by an edge label with the appropriate character takes O(|Σ|) time. Thus, McCreight’s algorithm can be run in O(n log |Σ|) time using O(n|Σ|) space, or in O(n|Σ|) time using O(n) space.

Generalized Suﬃx Trees

McCreight’s algorithm can be easily adapted to build the generalized suﬃx tree for a set S = {s1, s2,... , sk } of strings of total length N in O(N ) time. A simple way to do this is to construct the string S = s1$1s2$2 ... sk $k, where each $i is a unique string termination character that does not occur in any string in S. Using McCreight’s algorithm, ST (S) can be computed in O(N ) time. This diﬀers from GST (S) in the following way: Consider a suﬃx suf fj of string si in GST (S). The corresponding suﬃx in ST (S) is si[j..|si |]$isi+1$i+1 ... sk $k. Let v be the last internal node on the path from root to leaf representing this suﬃx in ST (S). As each $i is unique and path-label(v) must be a common preﬁx of at least two suﬃxes in S, path-label(v) must be a preﬁx of si[j..|si|]. Thus, by simply shortening the edge label below v to terminate at the end of the string si and attaching a common termination character $ to it, the corresponding suﬃx in GST (S) can be generated in O(1) time. Additionally, all suﬃxes in ST (S) that start with some $i should be removed and replaced by a single suﬃx $ in GST (S). Note that the suﬃxes to be removed are all directly connected to the root in ST (S), allowing easy O(1) time removal per suﬃx. Thus, GST (S) can be derived from ST (S) in O(N ) time.

Instead of ﬁrst constructing ST (S) and shortening edge labels of edges connecting to leaves to construct GST (S), the process can be integrated into the tree construction itself to directly compute GST (S). When inserting the suﬃx of a string, directly set the edge label connecting to the newly created leaf to terminate at the end of the string, appended by $. As each suﬃx that begins with $i in ST (S) is directly attached to the root, execution of McCreight’s algorithm on S will always result in a downward traversal starting from the root when a suﬃx starting from the ﬁrst character of a string is being inserted. Thus, we can simply start with an empty tree, insert all the suﬃxes of one string using McCreight’s algorithm, insert all the suﬃxes of the next string, and continue this procedure until all strings are inserted. To insert the ﬁrst suﬃx of a string, start by matching the unique path in the current tree that matches with a preﬁx of the string until no more matches are possible, and insert the suﬃx by branching at this point. To insert the remaining suﬃxes, continue as described in constructing the tree for one string.

This procedure immediately gives an algorithm to maintain the generalized suﬃx tree of a set of strings in the presence of insertions and deletions of strings. Insertion of a string is the same as executing McCreight’s algorithm on the current tree, and takes time proportional to the length of the string being inserted. To delete a string, we must locate the leaves corresponding to all the suﬃxes of the string. By mimicking the process of inserting the string in GST using McCreight’s algorithm, all the corresponding leaf nodes can be reached in time linear in the size of the string to be deleted. To delete a suﬃx, examine the corresponding leaf. If it is multiply labeled, it is enough to remove the label corresponding to the suﬃx. It it has only one label, the leaf and edge leading to it must be deleted. If the parent of the leaf is left with only one child after deletion, the parent and its two incident edges are deleted by connecting the surviving child directly to its grandparent with an edge labeled with the concatenation of the labels of the two edges deleted. As the adjustment at each leaf takes O(1) time, the string can be deleted in time proportional to its length.

Suﬃx trees were invented by Weiner [23], who also presented the ﬁrst linear time algorithm to construct them for a constant sized alphabet. McCreight’s algorithm is a more space-economical linear time construction algorithm [19]. A linear time on-line construction algorithm for suﬃx trees is invented by Ukkonen [22]. In fact, our presentation of Mc- Creight’s algorithm also draws from ideas developed by Ukkonen. A uniﬁed view of these three suﬃx tree construction algorithms is studied by Giegerich and Kurtz [10]. Farach [7] presented the ﬁrst linear time algorithm for strings over integer alphabets. The algorithm recursively constructs suﬃx trees for all odd and all even suﬃxes, respectively, and uses a clever strategy for merging them. The complexity of suﬃx tree construction algorithms for various types of alphabets is explored in [8].

Linear Time Construction of Suﬃx Arrays

Suﬃx arrays were proposed by Manber and Myers [18] as a space-eﬃcient alternative to suﬃx trees. While suﬃx arrays can be deduced from suﬃx trees, which immediately implies any of the linear time suﬃx tree construction algorithms can be used for suﬃx arrays, it would not achieve the purpose of economy of space. Until recently, the fastest known direct construction algorithms for suﬃx arrays all required O(n log n) time, leaving a frustrating gap between asymptotically faster construction algorithms for suﬃx trees, and asymptot- ically slower construction algorithms for suﬃx arrays, despite the fact that suﬃx trees contain all the information in suﬃx arrays. This gap is successfully closed by a number of researchers in 2003, including K¨ar¨akkanen and Sanders [13], Kim et al. [15], and Ko and Aluru [16]. All three algorithms work for the case of integer alphabet. Given the simplicity and/or space eﬃciency of some of these algorithms, it is now preferable to construct suﬃx trees via the construction of suﬃx arrays.

K¨ara¨kkanen and Sanders’ Algorithm

K¨ar¨akkanen and Sanders’ algorithm is the simplest and most elegant algorithm to date to construct suﬃx arrays, and by implication suﬃx trees, in linear time. The algorithm also works for the case of an integer alphabet. Let s be a string of length n over the alphabet

Σ = {1, 2,... , n}. For convenience, assume n is a multiple of three and s[n+1] = s[n+2] = 0.

The algorithm has the following steps:

To execute step (1), ﬁrst perform a radix sort of the 2 n triples (s[i], s[i + 1], s[i + 2]) for each i mod 3 j= 0 and associate with each distinct triple its rank ∈ {1, 2,... , 2 n} in sorted order.

If all triples are distinct, the suﬃxes are already sorted. Otherwise, let suf fi denote the string obtained by taking suf fi and replacing each consecutive triplet with its corresponding rank. Create a new string sf by concatenating suf f f with suf f f . Note that all suf f f with

i mod 3 = 1 (i mod 3 = 2, respectively) are suﬃxes of suf f f (suf f f , respectively). A lexicographic comparison of two suﬃxes in sf never crosses the boundary between suf f f and suf f f because the corresponding suﬃxes in the original string can be lexicographically distinguished. Thus, sorting sf recursively gives the sorted order of suf fi with i mod 3 j= 0.

Step (2) can be accomplished by performing a radix sort on tuples (s[i], rank(suf fi+1)) for all i mod 3 = 0, where rank(suf fi+1) denotes the rank of suf fi+1 in sorted order obtained in step (1).

Merging of the sorted arrays created in steps (1) and (2) is done in linear time, aided by the fact that the lexicographic order of a pair of suﬃxes, one from each array, can be determined in constant time. To compare suf fi (i mod 3 = 1) with suf fj (i mod 3 = 0), compare s[i] with s[j]. If they are unequal, the answer is clear. If they are identical, the ranks of suf fi+1 and suf fj+1 in the sorted order obtained in step (1) determines the answer. To compare suf fi (i mod 3 = 2) with suf fj (i mod 3 = 0), compare the ﬁrst two characters of the two suﬃxes. If they are both identical, the ranks of suf fi+2 and suf fj+2 in the sorted order obtained in step (1) determines the answer.

The run-time of this algorithm is given by the recurrence T (n) = T (

l) + O(n), which results in O(n) run-time. Note that the 2 : 1 split is designed to make the merging step easy. A 1 : 1 split does not allow easy merging because when comparing two suﬃxes for merging, no matter how many characters are compared, the remaining suﬃxes will not fall in the same sorted array, where ranking determines the result without need for further comparisons. Kim et al.’s linear time suﬃx array construction algorithm is based on a 1 : 1 split, and the merging phase is handled in a clever way so as to run in linear time. This is much like Farach’s algorithm for constructing suﬃx trees [7] by constructing suﬃx trees for even and odd positions separately and merging them. Both the above linear time suﬃx array construction algorithms partition the suﬃxes based on their starting positions in the string.

A completely diﬀerent way of partitioning suﬃxes based on the lexicographic ordering of a suﬃx with its right neighboring suﬃx in the string is used by Ko and Aluru to derive a linear time algorithm [16]. This reduces solving a problem of size n to that of solving a problem of size no more than I n l, while eliminating the complex merging step. The algorithm can be made to run in only 2n words plus 1.25n bits for strings over constant alphabet. Algorithmically, Ka¨r¨akkanen and Sanders’ algorithm is akin to mergesort and Ko and Aluru’s algorithm is akin to quicksort. Algorithms for constructing suﬃx arrays in external memory are investigated by Crauser and Ferragina [5].

It may be more space eﬃcient to construct a suﬃx tree by ﬁrst constructing the corresponding suﬃx array, deriving the Lcp array from it, and using both to construct the suﬃx tree. For example, while all direct linear time suﬃx tree construction algorithms depend on constructing and using suﬃx links, these are completely avoided in the indirect approach. Furthermore, the resulting algorithms have an alphabet independent run-time of O(n) while using only the O(n) space representation of suﬃx trees. This should be contrasted with the O(|Σ|n) run-time of either McCreight’s or Ukkonen’s algorithms.

Space Issues

Suﬃx trees and suﬃx arrays are space eﬃcient in an asymptotic sense because the memory required grows linearly with input size. However, the actual space usage is of signiﬁcant concern, especially for very large strings. For example, the human genome can be represented as a large string over the alphabet Σ = {A, C, G, T } of length over 3 × 109. Because of linear dependence of space on the length of the string, the exact space requirement is easily characterized by specifying it in terms of the number of bytes per character. Depend- ing on the number of bytes per character required, a data structure for the human genome may ﬁt in main memory, may need a moderate sized disk, or might need a large amount of secondary storage. This has signiﬁcant inﬂuence on the run-time of an application as access to secondary storage is considerably slower. It may also become impossible to run an application for large data sizes unless careful attention is paid to space eﬃciency.

Consider a naive implementation of suﬃx trees. For a string of length n, the tree has n leaves, at most n − 1 internal nodes, and at most 2n − 2 edges. For simplicity, count the space required for each integer or a pointer to be one word, equal to 4 bytes on most current computers. For each leaf node, we may store a pointer to its parent, and store the starting index of the suﬃx represented by the leaf, for 2n words of storage. Storage for each internal node may consist of 4 pointers, one each for parent, leftmost child, right sibling and suﬃx link, respectively. This will require approximately 4n words of storage. Each edge label consists of a pair of integers, for a total of at most 4n words of storage. Putting this all together, a naive implementation of suﬃx trees takes 10n words or 40n bytes of storage.

Several techniques can be used to considerably reduce the naive space requirement of 40 bytes per character. Many applications of interest do not need to use suﬃx links. Similarly, a pointer to the parent may not be required for applications that use traversals down from the root. Even otherwise, note that a depth ﬁrst search traversal of the suﬃx tree starting from the root can be conducted even in the absence of parent links, and this can be utilized in applications where a bottom-up traversal is needed. Another technique is to store the internal nodes of the tree in an array in the order of their ﬁrst occurrence in a depth ﬁrst search traversal. With this, the leftmost child of an internal node is found right next to it in the array, which removes the need to store a child pointer. Instead of storing the starting and ending positions of a substring corresponding to an edge label, an edge label can be stored with the starting position and length of the substring. The advantage of doing so is that the length of the edge label is likely to be small. Hence, one byte can be used to store edge labels with lengths < 255 and the number 255 can be used to denote edge labels with length at least 255. The actual values of such labels can be stored in an exceptions list, which is expected to be fairly small. Using several such techniques, the space required per character can be roughly cut in half to about 20 bytes [17].

A suﬃx array can be stored in just one word per character, or 4 bytes. Most applications using suﬃx arrays also need the Lcp array. Similar to the technique employed in storing edge labels on suﬃx trees, the entries in Lcp array can also be stored using one byte, with exceptions handled using an ordered exceptions list. Provided most of the lcp values ﬁt in a byte, we only need 5 bytes per character, signiﬁcantly smaller than what is required for suﬃx trees. Further space reduction can be achieved by the use of compressed suﬃx trees and suﬃx arrays and other data structures [9, 11]. However, this often comes at the expense of increased run-time complexity.

Search This Blog

algorithms

Suffix Trees and Suffix Arrays:Linear Time Construction Algorithm

Comments

Post a Comment

Popular posts from this blog

0/1 Knapsack Problem Memory function.

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Warshall’s Algorithm -to find TRANSITIVE CLOSURE