Trees with Minimum Weighted Path Length:Optimal Alphabetic Tree Problem.

Optimal Alphabetic Tree Problem

The alphabetic tree problem looks very similar to the Huﬀman problem, except that the leaves of the alphabetic tree (read left-to-right) should be in the same order as in the original input sequence. Similarly as in Huﬀman coding the binary tree must be full , i.e., each internal node must have exactly two sons.

The main diﬃculty is that we cannot localize easily two items which are to be combined.

Assume we have a sequence of n weighted items, where wi is the non-negative weight of the ith item. We write α = w1 ... wn . The sequence will be changing in the course of the algorithm.

An alphabetic tree over α is an ordered binary tree T with n leaves, where the ith leaf (in left-to-right order) corresponds to the ith item of The optimal alphabetic tree problem (OAT problem) is to ﬁnd an alphabetic tree of minimum cost.

The Garsia-Wachs algorithm solves the alphabetic tree problem, it is a version of an earlier algorithm by Hu and Tucker, see [18]. The strangest part of the algorithm is that it permutes α, though the ﬁnal tree should have the order of leaves the same as the order of items in the original sequence. We adopt the convention that the items of α have unique names, and that these names are preserved when items are moved. When convenient to do so, we will assume that those names are the positions of items in the list, namely integers in [1 ... n].

Computing the Cost of Optimal Alphabetic Tree

First we show how to compute only the cost of the whole tree, however this computation does not give automatically an optimal alphabetic tree, since we will be permuting the sequence of items. Each time we combine two adjacent items in the current permutation, however these items are not necessarily adjacent in the original sequence, so in any legal alphabetic tree they cannot be sons of a same father.

The alphabetic tree is constructed by reducing the initial sequence of items to a shorter sequence in a manner similar to that of the Huﬀman algorithm, with one important diﬀer- ence. In the Huﬀman algorithm, the minimum pair of items are combined, because it can be shown that they are siblings in the optimal tree. If we could identify two adjacent items that are siblings in the optimal alphabetic tree, we could combine them and then proceed recursively. Unfortunately, there is no known way to identify such a pair. Even a minimal pair may not be siblings. Consider the weight sequence (877 8). The second and the third items are not siblings in any optimal alphabetic tree.

Instead, the HT and GW algorithms, as well as the algorithms of [20, 22, 23, 46], operate by identifying a pair of items that have the same level in the optimal tree. These items are then combined into a single “package,” reducing the number of items by one. The details on how this process proceeds diﬀer in the diﬀerent algorithms. Deﬁne, for 1 ≤ i < n, the ith two-sum:

The Operator Move. If w is any item in a list π of weighted items, deﬁne RightPos(w) to be the predecessor of the nearest right larger or equal neighbor of w. If w has no right larger or equal neighbor, deﬁne RightPos(w) to be |π| + 1.

Let Move(w, π) be the operator that changes π by moving w w is inserted between positions Right Pos (w) − 1 and Right Pos (w). For example

Correctness of the algorithm is a complicated issue. There are two simpliﬁed proofs, see [19, 30] and we refer to these papers for detailed proof. In [19] only the rightmost minimal pair can be processed each time, while [30] gives correctness of general algorithm when any minimal pair is processed, this is important in parallel computation, when we process simultaneously many such pairs. The proof in [30] shows that correctness is due to well- shaped bottom segments of optimal alphabetic trees, this is expressed as a structural theorem in [30] which gives more insight into the global structure of optimal alphabetic trees.

For j > i + 1 denote by πi,j the sequence π in which elements i, i + 1 are moved just before left of j.

THEOREM 14.7 [Correctness of the GW algorithm]

Let (i, i + 1) be a locally minimal pair and RightPos (i, i + 1) = j, and let T t be a tree over the sequence πi,j , optimal among all trees over πi,j in which i, i +1 are siblings. Then there is an optimal alphabetic tree T over the original sequence π = (1,... n) such that T ∼= T t.

Correctness can be also expressed as equivalence between some classes of trees.

Two binary trees T1 and T2 are said to be level equivalent (we write T1 ∼= T2) if T1, and T2 have the same set of leaves (possibly in a diﬀerent order) and level T1 = level T2 .

Denote by OPT(i) the set of all alphabetic trees over the leaf-sequence (1,... n) which are optimal among trees in which i and i + 1 are at the same level. Assume the pair (i, i + 1) is locally minimal. Let OPTmoved (i) be the set of all alphabetic trees over the leaf-sequence πi,j which are optimal among all trees in which leaves i and i + 1 are at the same level, where j = RightPos(i, i + 1).

Two sets of trees OPT and OPTt are said to be level equivalent , written OPT ∼= OPTt, if, for each tree T ∈ OPT, there is a tree T t ∈ OPTt such that T t ∼= T , and vice versa.

Construction of Optimal Alphabetic Tree

The full Garsia-Wachs algorithm ﬁrst computes the level tree. This tree can be easily constructed in the function GW (π) when computing the cost of alphabetic tree. Each time we sum weights of two items (original or newly created) then we create new item which is their father with the weight being the sum of weights of sons.

Once we have a level tree, the optimal alphabetic tree can be constructed easily in linear time.

Figure 14.6, Figure 14.7, and Figure 14.8 show the process of construction the level tree and construction an optimal alphabetic tree knowing the levels of original items.

LEMMA 14.8 Assume we know level of each leaf in an optimal alphabetic tree. Then the tree can be constructed in linear time.

Proof The levels give the “shape” of the tree, see Figure 14.8.

Assume l1, l2, l3,..., ln is the sequence of levels. We scan this sequence from left-to-right until we ﬁnd the ﬁrst two levels li, li+1 which are the same. Then we know that the leaves I and i + 1 are sons of a same father, hence we link these leaves to a newly created father and we remove these leaves, in the level sequence the pair li, li+1 is replaced by a single level li − 1. Next we check if li−1 = li − 1, if not we search to the right. We keep the scanned and newly created levels on the stack. The total time is linear.

There are possible many diﬀerent optimal alphabetic trees for the same sequence, Fig- ure 14.9 shows an alternative optimal alphabetic tree for the same example sequence.

THEOREM 14.9 Optimal alphabetic tree can be constructed in O(n log n) time.

Proof We keep the array of levels of items. The array level is global of size (2n − 1). Its indices are the names of the nodes, i.e., the original n items and the (n − 1) nodes (“packages”) created during execution of the algorithm. The algorithm works in quadratic time, if implemented in a naive way. Using priority queues, it works in O(n log n) time. Correctness follows directly from Theorem 14.7.

Optimal Alphabetic Trees for Presorted Items

We have seen that Huﬀman trees can be constructed in linear time if the weights are presorted.

Larmore and Przytycka, see [22] have shown that slightly weaker similar result holds for alphabetic trees as well:

assume that weights of items are sortable in linear time, then the alphabetic tree problem can be solved in O(n log log n) time.

Open problem Is it possible to construct alphabetic trees in linear time in the case when the weights are sortable in linear time?

Search This Blog

algorithms

Trees with Minimum Weighted Path Length:Optimal Alphabetic Tree Problem.

Optimal Alphabetic Tree Problem

Comments

Post a Comment

Popular posts from this blog

0/1 Knapsack Problem Memory function.

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Warshall’s Algorithm -to find TRANSITIVE CLOSURE