Interval, Segment, Range, and Priority Search Trees:Segment Trees
Segment Trees
The segment tree structure, originally introduced by Bentley[5, 22], is a data structure for intervals whose endpoints are fixed or known a priori. The set S = {I1, I2,..., In} of n intervals, each of which represented by Ii = [fi, ri], fi, ri ∈ 3i, fi ≤ ri, is represented by data array, Data Array(S), whose entries correspond to the endpoints, fi or ri , and are sorted in non-decreasing order. This sorted array is denoted SA[1..N ], N = 2n. That is, SA[1] ≤ SA[2] ≤ ... ≤ SA[N ], N = 2n. We will in this section use the indexes in the range [1,N ] to refer to the entries in the sorted array SA[1..N ]. For convenience we will be working in the transformed domain using indexes, and a comparison involving a point q ∈ 3 i and an index i ∈ ℵ, unless otherwise specified, is performed in the original domain in 3i. For instance, q < i is interpreted as q <SA[i].
The segment tree structure, as will be demonstrated later, can be useful in finding the measure of a set of intervals. That is, the length of the union of a set of intervals. It can also be used to find the maximum clique of a set of intervals. This structure can be generalized to higher dimensions.
Construction of Segment Trees
The segment tree, as the interval tree discussed in Section 18.2 is a rooted augment binary search tree, in which each node v is associated with a range of integers v.range = [v.B, v.E], v.B, v.E ∈ ℵ, v.B < v.E, representing a range of indexes from v.B to v.E, a key, v.key that split v.range into two subranges, each of which is associated with each child of v, two tree pointers v.left and v.right to the left and right subtrees, respectively, and an auxiliary pointer, v.aux to an augmented data structure. Given integers s and t, with 1 ≤ s < t ≤ N , the segment tree, denoted Segment Tree(s, t), is recursively described as follows.
• The root node v, denoted Segment Tree root(s, t), is associated with the range [s, t], and v.B = s and v.E = t.
• If s + 1 = t then we have a leaf node v with v.B = s, v.E = t and v.key = nil.
• Otherwise (i.e., s +1 < t), let m be the mid-point of s and t, or m = l (v.B+v.E) J.
Set v.key = m.
• Tree pointer v.left points to the left subtree rooted at Segment Tree root(s, m), and tree pointer v.right points to the right subtree rooted at Segment Tree root(m, t).
• Auxiliary pointer v.aux points to an augmented data structure, associated with the range [s, t], whose content depends on the usage of the segment tree.
The following is a pseudo code for the construction of a segment tree for a range [s, t] s< t, s, t ∈ ℵ, and the construction of a set of n intervals whose endpoints are indexed by an array of integers in the range [1,N ],N = 2n can be done by a call to Segment Tree(1,N ).
See Fig. 18.2(b) for an illustration.
function Segment Tree(s, t)
/* It returns a pointer v to the root, Segment Tree root(s, t), of the segment tree for the range [s, t].*/
Input: A set N of integers, {s, s + 1,... , t} representing the indexes of the endpoints of a subset of intervals.
Output: A segment tree, rooted at Segment Tree root(s, t).
Method:
The parameters v.B and v.E associated with node v define a range [v.B, v.E], called a standard range associated with v. The standard range associated with a leaf node is also called an elementary range. It is straightforward to see that Segment Tree(s, t) constructed in function Segment Tree(s, t) described above is balanced, and has height, denoted Segment Tree.height, equal to plog2(t − s)l.
We now introduce the notion of canonical covering of a range [s, t], where s, t ∈ ℵ and 1 ≤ s < t ≤ N . A node v in Segment Tree(1,N ) is said to be in the canonical covering of [s, t] if its associated standard range satisfies this property [v.B, v.E] ⊆ [s, t], while that of its parent node does not. It is obvious that if a node v is in the canonical covering, then its sibling node, i.e., the node with the same parent node as the present one, is not, for otherwise the common parent node would have been in the canonical covering. Thus at each level there are at most two nodes that belong to the canonical covering of [s, t].
Thus, for each range [s, t] the number of nodes in its canonical covering is at most plog2(t− s)l + llog2(t − s)J− 2. In other words, a range [s, t] (or respectively an interval [s, t]) can be decomposed into at most plog2(t − s)l + llog2(t − s)J− 2 standard ranges (or respectively subintervals)[5, 22].
To identify the nodes in a segment tree T that are in the canonical covering of an interval I = [b, e], representing a range [b, e], we perform a call to Interval Insertion(v, b, e, Q), where v is Segment Tree root(S). The procedure Interval Insertion(v, b, e, Q) is defined below.
procedure Interval Insertion(v, b, e, Q) /* It returns a queue Q of nodes q ∈ T such that [v.B, v.E] ⊆ [b, e] and its parent node u whose [u.B, u.E] j⊆ [b, e].*/
Input: A segment tree T pointed to by its root node, v = Segment Tree root(1,N ), for a set S of intervals.
Output: A queue Q of nodes in T that are in the canonical covering of [b, e]
Method:
To append [b, e] to a node v means to insert interval I = [b, e] into the auxiliary structure associated with node v to indicate that node v whose standard range is totally contained in I is in the canonical covering of I. If the auxiliary structure v.aux associated with node v is an array, the operation append [b, e] to a node v can be implemented as v.aux[j++] = I, procedure Interval Insertion(v, b, e, Q) described above can be used to represent a set S of n intervals in a segment tree by performing the insertion operation n times, one for each interval. As each interval I can have at most O(log n) nodes in its canonical covering, and hence we perform at most O(log n) append operations for each insertion, the total amount of space required in the auxiliary data structures reflecting all the nodes in the canonical covering is O(n log n).
Deletion of an interval represented by a range [b, e] can be done similarly, except that the append operation will be replaced by its corresponding inverse operation remove that removes the node from the list of canonical covering nodes.
THEOREM 18.2 The segment tree for a set S of n intervals can be constructed in O(n log n) time, and if the auxiliary structure for each node v contains a list of intervals containing v in the canonical covering, then the space required is O(n log n).
Examples and Its Applications
Fig. 18.2(b) illustrates an example of a segment tree of the set of intervals, as shown in Fig. 18.2(a). The integers, if any, under each node v represent the indexes of intervals that contain the node in its canonical covering. For example, Interval I2 contains nodes labeled by standard ranges [2, 4] and [4, 7].
We now describe how segment trees can be used to solve the Enclosing Interval Searching Problem defined before and the Maximum Clique Problem of a set of intervals, which is defined below.
Maximum Density or Maximum Clique of a set of Intervals [12, 16, 23] Given a set S of n intervals, find a maximum subset C ⊆ S such that the common intersection of intervals in is non-empty. That is, maximized. |C| is called the density of the set.
The following pseudo code solves the Enclosing Interval Searching Problem in O(log n)+|F | time, where F is the output set. It is invoked by a call Point in Interval Search(v, q, F ), where v is Segment Tree root(S), and F initially set to be ∅, will contain the set of intervals containing a query point q.
procedure Point in Interval Search(v, q, F )
/* It returns in F the list of intervals stored in the segment tree pointed to by v and containing query point q */
Input: A segment tree representing a set S of n intervals, S = {Ii|i = 1, 2,... , n} and a query point q ∈ 3i. The auxiliary structure v.aux associated with each node v is a list of intervals I ∈ S that contain v in their canonical covering.
Output: A subset F = {Ii|fi ≤ q ≤ ri}.
Method:
We now address the problem of finding the maximum clique of a set S of intervals, S = {I1, I2,... , In}, where each interval Ii = [fi, ri], and fi ≤ ri, fi, ri ∈ 3i,i = 1, 2,... , n.
There are other approaches, such as plane-sweep [12, 16, 22, 23] that solve this problem within the same complexity.
For this problem we introduce an auxiliary data structure to be stored at each node v.
v. aux will contain two pieces of information: one is the number of intervals containing v
in the canonical covering, denoted v.�, and the other is the clique size, denoted v.clq. The clique size of a node v is the size of the maximum clique whose common intersection is contained in the standard range associated with v. It is defined to be equal to the larger of the two numbers: v.left .� + v.left .clq and v.right .� + v.right .clq. For a leaf node v, v.clq = 0. The size of the maximum clique for the set of intervals will then be stored at the root node Segment Tree root(S) and is equal to the sum of v.� and v.clq, where v = Segment Tree root(S). It is obvious that the space needed for this segment tree is linear.
As this data structure supports insertion of intervals incrementally, it can be used to answer the maximum clique of the current set of intervals as the intervals are inserted into (or deleted from) the segment tree T . The following pseudo code finds the size of the maximum clique of a set of intervals.
function Maximum Clique(S)
/* It returns the size of the maximum clique of a set S of intervals. */
Input: A set S of n intervals and the segment tree T rooted at Segment Tree root(S).
Output: An integer, which is the size of the maximum clique of S.
Method: Assume that S = {I1, I2,..., In} and that the endpoints of the intervals are represented by the indexes of a sorted array containing these endpoints.
Note that in procedure Maximum Clique(S) it takes O(log n) time to process each interval. We conclude with the following theorem.
THEOREM 18.3 Given a set S = {I1, I2,... , In} of n intervals, the maximum clique of Si = {I1, I2,..., Ii} can be found in O(i log i) time and linear space, for each i = 1, 2,... , n, by using a segment tree.
We note that the above procedure can be adapted to find the maximum clique of a set of hyper rectangles in k-dimensions for k > 2 in time O(nk ).[16]
Comments
Post a Comment