Tries:Compressed Tries.

Compressed Tries

Take a close look at the trie of Figure 28.1. This trie has a few branch nodes (nodes B, D, and F ) that do not partition the elements in their subtrie into two or more nonempty groups. We often can improve both the time and space performance metrics of a trie by eliminating all branch nodes that have only one child. The resulting trie is called a compressed trie.

When branch nodes with a single child are removed from a trie, we need to keep additional information so that dictionary operations may be performed correctly. The additional information stored in three compressed trie structures is described below.

Compressed Tries with Digit Numbers

In a compressed trie with digit numbers, each branch node has an additional field digit N umber that tells us which digit of the key is used to branch at this node.

Figure 11 shows the compressed trie with digit numbers that corresponds to the trie of Figure 28.1. The leftmost field of each branch node of Figure 28.10 is the digit N umber field.

Searching a Compressed Trie with Digit Numbers

A compressed trie with digit numbers may be searched by following a path from the root. At each branch node, the digit, of the search key, given in the branch node’s digitN umber field is used to determine which subtrie to move to. For example, when searching the trie of Figure 28.10 for an element with key 951-23-7625, we start at the root of the trie. Since the root node is a branch node with digitN umber = 1, we use the first digit 9 of the search key to determine which subtrie to move to. A move to node A.child[9] = I is made. Since I.digitN umber = 4, the fourth digit, 2, of the search key tells us which subtrie to move to. A move is now made to node I.child[2] = J . We are now

image

at an element node, and the search key is compared with the key of the element in node J . Since the keys match, we have found the desired element.

Notice that a search for an element with key 913-23-7625 also terminates at node J . However, the search key and the element key at node J do not match and we conclude that the trie contains no element with key 913-23-7625.

Inserting into a Compressed Trie with Digit Numbers

To insert an element with key 987-26-1615 into the trie of Figure 28.10, we first search for an element with this key. The search ends at node J . Since the search key and the key, 951-23-7625, of the element in this node do not match, we conclude that the trie has no element whose key matches the search key. To insert the new element, we find the first digit where the search key differs from the key in node J and create a branch node for this digit. Since the first digit where the search key 987-26-1615 and the element key 951-23- 7625 differ is the second digit, we create a branch node with digitN umber = 2. Since digit values increase as we go down the trie, the proper place to insert the new branch node can be determined by retracing the path from the root to node J and stopping as soon as either a node with digit value greater than 2 or the node J is reached. In the trie of Figure 28.10, this path retracing stops at node I. The new branch node is made the parent of node I, and we get the trie of Figure 28.11.

Consider inserting an element with key 958-36-4194 into the compressed trie of Fig- ure 28.10. The search for an element with this key terminates when we fall of the trie by following the pointer I.child[3] = null. To complete the insertion, we must first find an element in the subtrie rooted at node I. This element is found by following a downward path from node I using (say) the first non null link in each branch node encountered. Doing

image

this on the compressed trie of Figure 28.10, leads us to node J . Having reached an element node, we find the first digit where the element key and the search key differ and complete the insertion as in the previous example. Figure 28.12 shows the resulting compressed trie. Because of the possible need to search for the first non null child pointer in each branch node, the time required to insert an element into a compressed tries with digit numbers is O(rd), where r is the trie radix and d is the maximum number of digits in any key.

Removing an Element from a Compressed Trie with Digit Numbers

To remove an element whose key is theKey, we do the following:

1. Find the element node X that contains the element whose key is theKey.

2. Discard node X.

3. If the parent of X is left with only one child, discard the parent node also. When the parent of X is discarded, the sole remaining child of the parent of X becomes a child of the grandparent (if any) of X.

To remove the element with key 951-94-1654 from the compressed trie of Figure 28.12, we first locate the node K that contains the element that is to be removed. When this node is discarded, the parent I of K is left with only one child. Consequently, node I is also discarded, and the only remaining child J of node I is the made a child of the grandparent of K. Figure 28.13 shows the resulting compressed trie.

Because of the need to determine whether a branch node is left with two or more children, removing a d digit element from a radix r trie takes O(d + r) time.

image

Compressed Tries with Skip Fields

In a compressed trie with skip fields, each branch node has an additional field skip which tells us the number of branch nodes that were originally between the current branch node and its parent. Figure 15 shows the compressed trie with skip fields that corresponds to the trie of Figure 28.1. The leftmost field of each branch node of Figure 28.14 is the skip field. The algorithms to search, insert, and remove are very similar to those used for a compressed trie with digit numbers.

Compressed Tries with Edge Information

In a compressed trie with edge information, each branch node has the following additional information associated with it: a pointer/reference element to an element (or element node) in the subtrie, and an integer skip which equals the number of branch nodes eliminated between this branch node and its parent. Figure 28.15 shows the compressed trie with edge information that corresponds to the trie of Figure 28.1. The first field of each branch node is its element field, and the second field is the skip field.

Even though we store the “edge information” with branch nodes, it is convenient to think of this information as being associated with the edge that comes into the branch node from its parent (when the branch node is not the root). When moving down a trie, we follow edges, and when an edge is followed. we skip over the number of digits given by the skip field of the edge information. The value of the digits that are skipped over may be determined by using the element field.

When moving from node A to node I of the compressed trie of Figure 28.15, we use digit 1 of the key to determine which child field of A is to be used. Also, we skip over the next 2 digits, that is, digits 2 and 3, of the keys of the elements in the subtrie rooted at

image

I. Since all elements in the subtrie I have the same value for the digits that are skipped over, we can determine the value of these skipped over digits from any of the elements in the subtrie. Using the element field of the edge information, we access the element node J , and determine that the digits that are skipped over are 5 and 1.

Searching a Compressed Trie with Edge Information

When searching a compressed trie with edge information, we can use the edge information to terminate unsuccessful searches (possibly) before we reach an element node or fall off the trie. As in the other compressed trie variants, the search is done by following a path from the root. Suppose we are searching the compressed trie of Figure 28.15 for an element with key 921-23-1234. Since the skip value for the root node is 0, we use the first digit 9 of the search key to determine which subtrie to move to. A move to node A.child[9] = I is made. By examining the edge information (stored in node I), we determine that, in making the move from node A to node I, the digits 5 and 1 are skipped. Since these digits do not agree with the next two digits of the search key, the search terminates with the conclusion that the trie contains no element whose key equals the search key.

Inserting into a Compressed Trie with Edge Information

To insert an element with key 987-26-1615 into the compressed trie of Figure 28.15, we first search for an element with this key. The search terminates unsuccessfully when we move from node A to node I because of a mismatch between the skipped over digits and the corresponding digits of the search key. The first mismatch is at the first skipped over digit. Therefore, we insert a branch node L between nodes A and I. The skip value for this branch node is 0, and its element field is set to reference the element node for the newly inserted element. We must also change the skip value of I to 1. Figure 28.16 shows the resulting compressed trie.

Suppose we are to insert an element with key 958-36-4194 into the compressed trie of Figure 16. The search for an element with this key terminates when we move to node I because of a mismatch between the digits that are skipped over and the corresponding digits of the search key. A new branch node is inserted between nodes A and I and we get the compressed trie that is shown in Figure 28.17.

The time required to insert a d digit element into a radix r compressed trie with edge information is O(r + d).

image

Removing an Element from a Compressed Trie with Edge Information

This is similar to removal from a compressed trie with digit numbers except for the need to update the element fields of branch nodes whose element field references the re- moved element.

Space Required by a Compressed Trie

Since each branch node partitions the elements in its subtrie into two or more nonempty groups, an n element compressed trie has at most n 1 branch nodes. Therefore, the space required by each of the compressed trie variants described by us is O(nr), where r is the trie radix.

When compressed tries are represented as hash tables, we need an additional data structure to store the nonpointer fields of branch nodes. We may use an array (much like we use the array inf ormation) for this purpose.

Comments

Popular posts from this blog

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Data Structure Visualization:Introduction and Value of Data Structure Rendering

0/1 Knapsack Problem Memory function.