Trees with Minimum Weighted Path Length:Optimal Alphabetic Tree Problem.

Optimal Alphabetic Tree Problem

The alphabetic tree problem looks very similar to the Huffman problem, except that the leaves of the alphabetic tree (read left-to-right) should be in the same order as in the original input sequence. Similarly as in Huffman coding the binary tree must be full , i.e., each internal node must have exactly two sons.

The main difficulty is that we cannot localize easily two items which are to be combined.

Assume we have a sequence of n weighted items, where wi is the non-negative weight of the ith item. We write α = w1 ... wn . The sequence will be changing in the course of the algorithm.

An alphabetic tree over α is an ordered binary tree T with n leaves, where the ith leaf (in left-to-right order) corresponds to the ith item of The optimal alphabetic tree problem (OAT problem) is to find an alphabetic tree of minimum cost.

The Garsia-Wachs algorithm solves the alphabetic tree problem, it is a version of an earlier algorithm by Hu and Tucker, see [18]. The strangest part of the algorithm is that it permutes α, though the final tree should have the order of leaves the same as the order of items in the original sequence. We adopt the convention that the items of α have unique names, and that these names are preserved when items are moved. When convenient to do so, we will assume that those names are the positions of items in the list, namely integers in [1 ... n].

Computing the Cost of Optimal Alphabetic Tree

First we show how to compute only the cost of the whole tree, however this computation does not give automatically an optimal alphabetic tree, since we will be permuting the sequence of items. Each time we combine two adjacent items in the current permutation, however these items are not necessarily adjacent in the original sequence, so in any legal alphabetic tree they cannot be sons of a same father.

The alphabetic tree is constructed by reducing the initial sequence of items to a shorter sequence in a manner similar to that of the Huffman algorithm, with one important differ- ence. In the Huffman algorithm, the minimum pair of items are combined, because it can be shown that they are siblings in the optimal tree. If we could identify two adjacent items that are siblings in the optimal alphabetic tree, we could combine them and then proceed recursively. Unfortunately, there is no known way to identify such a pair. Even a minimal pair may not be siblings. Consider the weight sequence (877 8). The second and the third items are not siblings in any optimal alphabetic tree.

Instead, the HT and GW algorithms, as well as the algorithms of [20, 22, 23, 46], operate by identifying a pair of items that have the same level in the optimal tree. These items are then combined into a single “package,” reducing the number of items by one. The details on how this process proceeds differ in the different algorithms. Define, for 1 i < n, the ith two-sum:

image

The Operator Move. If w is any item in a list π of weighted items, define RightPos(w) to be the predecessor of the nearest right larger or equal neighbor of w. If w has no right larger or equal neighbor, define RightPos(w) to be |π| + 1.

Let Move(w, π) be the operator that changes π by moving w w is inserted between positions Right Pos (w) 1 and Right Pos (w). For example

image

Correctness of the algorithm is a complicated issue. There are two simplified proofs, see [19, 30] and we refer to these papers for detailed proof. In [19] only the rightmost minimal pair can be processed each time, while [30] gives correctness of general algorithm when any minimal pair is processed, this is important in parallel computation, when we process simultaneously many such pairs. The proof in [30] shows that correctness is due to well- shaped bottom segments of optimal alphabetic trees, this is expressed as a structural theorem in [30] which gives more insight into the global structure of optimal alphabetic trees.

For j > i + 1 denote by πi,j the sequence π in which elements i, i + 1 are moved just before left of j.

THEOREM 14.7 [Correctness of the GW algorithm]

Let (i, i + 1) be a locally minimal pair and RightPos (i, i + 1) = j, and let T t be a tree over the sequence πi,j , optimal among all trees over πi,j in which i, i +1 are siblings. Then there is an optimal alphabetic tree T over the original sequence π = (1,... n) such that T = T t.

Correctness can be also expressed as equivalence between some classes of trees.

Two binary trees T1 and T2 are said to be level equivalent (we write T1 = T2) if T1, and T2 have the same set of leaves (possibly in a different order) and level T1 = level T2 .

Denote by OPT(i) the set of all alphabetic trees over the leaf-sequence (1,... n) which are optimal among trees in which i and i + 1 are at the same level. Assume the pair (i, i + 1) is locally minimal. Let OPTmoved (i) be the set of all alphabetic trees over the leaf-sequence πi,j which are optimal among all trees in which leaves i and i + 1 are at the same level, where j = RightPos(i, i + 1).

Two sets of trees OPT and OPTt are said to be level equivalent , written OPT = OPTt, if, for each tree T OPT, there is a tree T t OPTt such that T t = T , and vice versa.

image

Construction of Optimal Alphabetic Tree

The full Garsia-Wachs algorithm first computes the level tree. This tree can be easily constructed in the function GW (π) when computing the cost of alphabetic tree. Each time we sum weights of two items (original or newly created) then we create new item which is their father with the weight being the sum of weights of sons.

Once we have a level tree, the optimal alphabetic tree can be constructed easily in linear time.

Figure 14.6, Figure 14.7, and Figure 14.8 show the process of construction the level tree and construction an optimal alphabetic tree knowing the levels of original items.

LEMMA 14.8 Assume we know level of each leaf in an optimal alphabetic tree. Then the tree can be constructed in linear time.

Proof The levels give the “shape” of the tree, see Figure 14.8.

Assume l1, l2, l3,..., ln is the sequence of levels. We scan this sequence from left-to-right until we find the first two levels li, li+1 which are the same. Then we know that the leaves I and i + 1 are sons of a same father, hence we link these leaves to a newly created father and we remove these leaves, in the level sequence the pair li, li+1 is replaced by a single level li 1. Next we check if li1 = li 1, if not we search to the right. We keep the scanned and newly created levels on the stack. The total time is linear.

There are possible many different optimal alphabetic trees for the same sequence, Fig- ure 14.9 shows an alternative optimal alphabetic tree for the same example sequence.

THEOREM 14.9 Optimal alphabetic tree can be constructed in O(n log n) time.

Proof We keep the array of levels of items. The array level is global of size (2n 1). Its indices are the names of the nodes, i.e., the original n items and the (n 1) nodes (“packages”) created during execution of the algorithm. The algorithm works in quadratic time, if implemented in a naive way. Using priority queues, it works in O(n log n) time. Correctness follows directly from Theorem 14.7.

Optimal Alphabetic Trees for Presorted Items

We have seen that Huffman trees can be constructed in linear time if the weights are presorted.

Larmore and Przytycka, see [22] have shown that slightly weaker similar result holds for alphabetic trees as well:

assume that weights of items are sortable in linear time, then the alphabetic tree problem can be solved in O(n log log n) time.

Open problem Is it possible to construct alphabetic trees in linear time in the case when the weights are sortable in linear time?

image

image

image

Comments

Popular posts from this blog

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Data Structure Visualization:Introduction and Value of Data Structure Rendering

0/1 Knapsack Problem Memory function.