Quad trees and Octrees:Spatial Queries with Region Quad trees
Spatial Queries with Region Quad trees
In this section, we consider a number of spatial queries involving point data, and algorithms for them using compressed region quad trees.
Range queries are commonly used in database applications. Database records with d keys can be represented as points in d-dimensional space. In a range query, ranges of values are specified for all or a subset of keys with the objective of retrieving all the records that satisfy the range criteria. Under the mapping of records to points in multidimensional space, the ranges define a (possibly open-ended) hyperrectangular region. The objective is to retrieve all the points that lie in this query region.
As region quad trees organize points using a hierarchy of cells, the range query can be answered by finding a collection C of cells that are both fully contained in the query region and completely encompass the points in it. This can be achieved by a top-down traversal of the compressed region quad tree starting from the root. To begin with, C is empty. Consider a node v and its small cell S(v). If S(v) is outside the query region, the subtree underneath it can be safely discarded. If S(v) is completely inside the query region, it is added to C.
All points in the sub tree of S(v) are within the query region and reported as part of the output. If S(v) overlaps with the query region but is not contained in it, each child of v is examined in turn.
If the query region is small compared to the size of the root cell, it is likely that a path from the root is traversed where each cell on the path completely contains the query region. To avoid this problem, first compute the smallest cell encompassing the query region. This cell can be searched in O(d log n) time using the cell search algorithm described before, or perhaps in O(d) time if the hashing/indexing technique is applicable. The top-down traversal can start from the identified cell. When the cell is subdivided, at least two of its children represent cells that overlap with the query region. Consider a child cell and the part of the query region that is contained in it. The same idea can be recursively applied by finding the smallest cell that encloses this part of the query region and directly finding this cell rather than walking down a path in the tree to reach there. This ensures that the number of cells examined during the algorithm is O(|C|). To see why, consider the cells examined as organized into a tree based on subcell-supercell relationships. The leaves of the tree are the collection of cells C. Each cell in the tree is the smallest cell that encloses a subregion of the query region. Therefore, each internal node has at least two children. Consequently, the size of the tree, or the number of cells examined in answering the range query is O(|C|). For further study of range queries and related topics, see [5, 23, 25].
Next, we turn our attention to a number of spatial queries which we categorize as group queries. In group queries, a query to retrieve points that bear a certain relation to a query point is specified. The objective of the group query is to simultaneously answer n queries with each input point treated as a query point. For example, given n points, finding the nearest neighbor of each point is a group query. While the run time of performing the query on an individual point may be large, group queries can be answered more efficiently by intelligently combining the work required in answering queries for the individual points. Instead of presenting a different algorithm for each query, we show that the same generic framework can be used to solve a number of such queries. The central idea behind the group query algorithm is to realize run time savings by processing queries together for nearby points. Consider a cell C in the compressed quad tree and the (as yet uncomputed) set of points that result from answering the query for each point in C. The algorithm
keeps track of a collection of cells of size as close to C as possible that is guaranteed to contain these points. The algorithm proceeds in a hierarchical manner by computing this information for cells of decreasing sizes (see Figure 19.7).
A node u is said to be resolved with respect to node v if, either all points in S(u) are in the result of the query for all points in S(v) or none of the points in S(u) is in the result of the query for any point in S(v). Define the active set of a node v to be the set of nodes
u that cannot be resolved with respect to v such that S(u) ⊆ S(v) ⊆ L(u). The algorithm uses a depth first search traversal of the compressed quad tree. The active set of the root node contains itself. The active set of a node v is calculated by traversing portions of the sub trees rooted at the nodes in the active set of its parent. The functions Select, Status and Process used in the algorithm are designed based on the specific group query. When considering the status of a node u with respect to the node v, the function Status(v,u) returns one of the following three values:
• PROCESS – If S(u) is in the result of the query for all points in S(v)
• DISCARD – If S(u) is not in the result of the query for all points in S(v)
• UNKNOWN – If neither of the above is true
If the result is either PROCESS or DISCARD, the children of u are not explored. Otherwise, the size of S(u) is compared with the size of S(v). If S(u) ⊆ S(v), then u is added to the set of active nodes at v for consideration by v’s children. If S(u) is larger, then u’s children are considered with respect to v. It follows that the active set of a leaf node is empty, as the length of its small cell is zero. Therefore, entire subtrees rooted under the active set of the parent of a leaf node are explored and the operation is completed for the point inhabiting that leaf node. The function Process(v,u) reports all points in S(u) as part of the result of query for each point in S(v).
The order in which nodes are considered is important for proving the run time of some operations. The function Select is used to accommodate this.
Given a query point and a distance r > 0, the spherical region query is to find all points that lie within a distance of r from the query point. The group version of the spherical region query is to take n points and a distance r > 0 as input, and answer spherical region queries with respect to each of the input points.
A cell D may contain points in the spherical region corresponding to some points in cell C only if the smallest distance between D and C is less than r. If the largest distance between D and C is less than r, then all points in D are in the spherical region of every point in C.
Thus, the function Status(v,u) is defined as follows: If the largest distance between S(u) and S(v) is less than r, return PROCESS. If the smallest distance between S(u) and S(v) is greater than r, return DISCARD. Otherwise, return UNKNOWN. Processing u means including all the points in u in the query result for each point in v. For this query, no special selection strategy is needed.
For computing the k-nearest neighbors of each point, some modifications to the algorithm presented in Figure 19.7 are necessary. For each node w in P , the algorithm keeps track of the largest distance between S(v) and S(w). Let dk be the kth smallest of these distances.
If the number of nodes in P is less than k, then dk is set to ∞. The function Status(v,u) returns DISCARD if the smallest distance between S(v) and S(u) is greater than dk . The option PROCESS is never used. Instead, for a leaf node, all the points in the nodes in its active set are examined to select the k nearest neighbors. The function Select picks the largest cell in P , breaking ties arbitrarily.
Computing k-nearest neighbors is a well-studied problem [4, 35]. The algorithm presented here is equivalent to Vaidya’s algorithm [35, 36], even though the algorithms appear to be very different on the surface. Though Vaidya does not consider compressed quad trees, the computations performed by his algorithm can be related to traversal on compressed quad trees and a proof of the run time of the presented algorithm can be established by a correspondence with Vaidya’s algorithm. The algorithm runs in O(kn) time. The proof is quite elaborate, and omitted for lack of space. For details, see [35, 36]. The special case of n = 1 is called the all nearest neighbor query, which can be computed in O(n) time.
Comments
Post a Comment