Tries:Patricia
Patricia
The data structure Patricia (Practical Algorithm To Retrieve Information Coded In Alphanumeric) is a compressed binary trie in which the branch and element nodes have been melded into a single node type. Consider the compressed binary trie of Figure 28.18. Circular nodes are branch nodes and rectangular nodes are element nodes. The number inside a branch node is its bit number field; the left child of a branch node corresponds to the case when the appropriate key bit is 0 and the right child to the case when this bit is 1. The melding of branch and element nodes is done by moving each element from its element node to an ancestor branch node. Since the number of branch nodes is one less than the number of element nodes, we introduce a header node and make the compressed binary trie the left subtree of the header. Pointers that originally went from a branch node to an element node now go from that branch node to the branch node into which the correspond- ing element has been melded. Figure 28.19 shows a possible result of melding the nodes of Figure 28.18. The number outside a node is its bit number values. The thick pointers are backward pointers that replace branch-node to element-node pointers in Figure 28.18. A backward pointer has the property that the bit number value at the start of the pointer is ≥ the bit number value at its end. For original branch-node to branch-node pointers (also called downward pointers), the bit number value at the pointer end is always greater than the bit number value at the pointer start.
Searching
To search for an element with key the Key, we use the bits of the Key from left to right to move down the Patricia instance just as we would search a compressed binary trie. When a backward pointer is followed, we compare the Key with the key in the reached Patricia node. For example, to search for the Key = 1101, we start by moving into the left subtree of the header node. The pointer that is followed is a downward pointer (start bit number is 0 and end bit number is 1). We branch using bit 1 of the Key. Since this bit is 1, we follow the right child pointer. The start bit number for this pointer is 1 and the end bit number is 2. So, again, a downward pointer has been followed. The bit number for the reached node is 2. So, we use bit 2 of the Key to move further. Bit 2 is a 1. So, we use the right child pointer to get to the node that contains 1100. Again, a downward pointer was used. From this node, a move is made using bit 4 of the Key. This gets us to the node that contains 1101. This time, a backward pointer was followed (start bit of pointer is 4 and end bit is 1). When a backward pointer is followed, we compare theKey with the key in the reached node. In this case, the keys match and we have found the desired element. Notice that the same search path is taken when theKey = 1111. In this case, the final compare fails and we conclude that we have no element whose key is 1111.
Inserting an Element
We use an example to illustrate the insert algorithm. We start with an empty instance. Such an instance has no node; not even the header. For our example, we will consider keys that have 7 bits. For the first insert, we use the key 0000101. When inserting into an empty instance, we create a header node whose left child pointer points to the header node; the new element is inserted into the header; and the bit number field of the header set to 0.
The configuration of Figure 28.20(a) results. Note that the right child field of the header node is not used.
The key for the next insert is 0000000. The search for this key terminates at the header. We compare the insert key and the header key and determine that the first bit at which they differ is bit 5. So, we create a new node with bit number field 5 and insert the new element into this node. Since bit 5 of the insert key is 0, the left child pointer of the new node points to the new node and the right child pointer to the header node (Figure 28.20(b)).
For the next insertion, assume that the key is 0000010. The search for this key terminates at the node with 0000000. Comparing the two keys, we determine that the first bit at which the two keys differ is bit 6. So, we create a new node with bit number field 6 and put the new element into this new node. The new node is inserted into the search path in such a way that bit number fields increase along this path. For our example, the new node is inserted as the left child of the node with 0000000. Since bit 6 of the insert key is 1, the right child pointer of the new node is a self pointer and the left child pointer points to the node with 0000000. Figure 28.20(c) shows the result.
The general strategy to insert an element other than the first one is given below. The key of the element to be inserted is the Key.
1. Search for the Key. Let reached Key be the key in the node end N ode where the search terminates.
2. Determine the leftmost bit position lBitP os at which the Key and reachedKey differ. lBitP os is well defined so long as one of the two keys isn’t a prefix of the other.
3. Create a new node with bit number field lBitP os. Insert this node into the search path from the header to endN ode so that bit numbers increase along this path. This insertion breaks a pointer from node p to node q. The pointer now goes from p to the new node.
4. If bit lBitP os of theKey is 1, the right child pointer of the new node becomes a self pointer (i.e., it points to the new node); otherwise, the left child pointer of the new node becomes a self pointer. The remaining child pointer of the new node points to q.
For our next insert, the insert key is 0001000. The search for this key terminates at the node with 0000000. We determine that the first bit at which the insert key and reached Key = 0000000 differ is bit 4. We create a new node with bit number 4 and put the new element into this node. The new node is inserted on the search path so as to ensure that bit number fields increase along this path. So, the new node is inserted as the left child of the header node (Figure 28.18(d)). This breaks the pointer from the header to the node q with 0000000. Since bit 4 of the insert key is 1, the right child pointer of the new node is a self pointer and the left child pointer goes to node q.
We consider two more inserts. Consider inserting an element whose key is 0000100. The reached key is 0000101 (in the header). We see that the first bit at which the insert and reached keys differ is bit 7. So, we create a new node with bit number 7; the new element is put into the new node; the new node is inserted into the search path so as to ensure that bit numbers increase along this path (this requires the new node to be made a right child of the node with 0000000, breaking the child pointer from 0000000 to the header); for the broken pointer, p is the node with 0000000 and q is the header; the left child pointer of the new node is a self pointer (because bit 7 of the insert key is 0); and remaining child pointer (in this case the right child) of the new node points to q (see Figure 28.21(a)).
For our final insert, the insert key is 0001010. A search for this key terminates at the node with 0000100. The first bit at which the insert and reached keys differ is bit 6. So, we create a new node with bit number 6; the new element is put into the new node; the new node is inserted into the search path so as to ensure that bit numbers increase along this path (this requires the new node to be made a right child of the node with 0001000; so, p = q = node with 001000); the right child pointer of the new node is a self pointer (because
Removing an Element
Let p be the node that contains the element that is to be removed. We consider two cases for p—(a) p has a self pointer and (b) p has no self pointer. When p has a self pointer and p is the header, the Patricia instance becomes empty following the element removal (Figure 28.20(a)). In this case, we simply dispose of the header. When p has a self pointer and p is not the header, we set the pointer from the parent of p to the value of p’s non-self pointer. Following this pointer change, the node p is disposed. For example, to remove the element with key 0000010 from Figure 28.21(a), we change the left child pointer in the node with 0000000 to point to the node pointed at by p’s non-self pointer (i.e., the node with 0000000). This causes the left child pointer of 0000000 to become a self pointer. The node with 0000010 is then disposed.
For the second case, we first find the node q that has a backward pointer to node p. For example, when we are to remove 0001000 from Figure 28.21(b), the node q is the node with 0001010. The element q Element in q (in our example, 0001010) is moved to node p and we proceed to delete node q instead of node p. Notice that node q is the node from which we reached node p in the search for the element that is to be removed. To delete node q, we first find the node r that has a back pointer to q (for our example r = q). The node r is found by using the key of q Element. Once r is found, the back pointer to q that is in r is changed from q to p to properly account for the fact that we have moved q Element to p.
Finally, the downward pointer from the parent of q to q is changed to q’s child pointer that was used to locate r. In our example p is the parent of q and the right child pointer of p is changed from q to the right child of q, which itself was just changed to p.
We see that the time for each of the Patricia operations search, insert, and delete is O(h), where h is the height of the Patricia instance. For more information on Patricia and general tries, see [2, 3].
Comments
Post a Comment