Approximate Geometric Query Structures:Approximate Queries
Approximate Queries
Before elaborating on the structures and search algorithms used to answer certain geometric queries, let us first introduce the basic definitions of approximate nearest-neighbor, farthest- neighbor, and range queries, see [1–3, 11, 13].
DEFINITION 26.2 Given a set S of points in IRd, a query point q ∈ IRd, a (connected) query region Q ⊂ IRd, and E > 0, we define the following queries (see Figure 26.1):
To clarify further, a point p is a (1 + E)-approximate nearest neighbor if its distance is within a constant error factor of the true nearest distance. Although we do not discuss it here, we can extend the definitions to report a sequence of k (1 + E)-nearest (or (1 − E)- farthest) neighbors. One may also wonder why the approximate farthest neighbor is defined in absolute terms instead of relative terms as with the nearest version. By observing that the distance from any query point to its farthest neighbor is always at least the radius of the point set OS , one can see that the approximation is as good as the relative approximation. Moreover, if the query point is extremely far from the point set, then a relative approximation could return any point in S, whereas an absolute approximation would require a much
FIGURE 26.1: Examples of (a) an approximate nearest-neighbor p to q with the exact neighbor p∗, (b) an approximate farthest-neighbor p to q with the exact neighbor p∗, and
(a) an approximate range query; here the dark points are reported or counted and the lighter points are not.
more accurate distance.
Although we do not modify our definition here, one can extend this notion and our later theorems to compensate for the problem of query points distant from the point set in nearest-neighbor queries as well. In other words, when the query point is relatively close to the entire data set, we can use the relative error bound, and when it is relatively far away from the entire data set, we can use the absolute error bound.
The approximate range searching problem described above has one-sided false-positive errors. We do not miss any valid points, but we may include erroneous points near the range boundary. It is a simple modification of the query regions to instead get false-negative errors. That is, we could instead require including only points that are inside of Q but allow missing some points that are inside of Q if they are near the border. In fact, for any region Q one could define two epsilon ranges Ei and Eo for both interior and exterior error bounds and then treat the approximation factor as E = Ei + Eo.
There are numerous approaches one may take to solving these problems. Arya et al. [1] introduced a priority search algorithm for visiting nodes in a partition tree to solve nearest- neighbor queries. Using their BBD tree structure, they were able to prove efficient query times. Duncan et al. [13] show how this priority search could also solve farthest-neighbor queries.
The nearest and farthest neighbor priority searching algorithms shown in Fig-ures and come from [11]. In the approximate nearest-neighbor, respectively farthest-neighbor, search, nodes are visited in order of closest node, respectively farthest node. Nodes are extracted via an efficient priority queue, such as the Fibonacci heap [10, 15].
Introduced by [3], the search technique used for the approximate range query is a modification to the standard range searching algorithm for regular partition trees. We present the algorithm from [11] in Figure 26.4. In this algorithm, we have two different query regions, the inner region Q and the outer region Ql ⊇ Q. The goal is to return all points in S that lie inside Q, allowing some points to lie inside Ql but none outside of Ql. That is Ql − Q defines a buffer zone that is the only place allowed for erroneous points. Whenever a node u is visited, if u is a leaf node, we simply check all of u’s associated data points. Otherwise, if Ru does not intersect Q, we know that none of its points can lie in Q, and we ignore u and its subtree. If Ru lies completely inside Ql then all of the data points in its subtree must lie inside Ql, and we report all points. Otherwise, we repeat the process on u’s two child nodes. For an E-approximate range search, we define Ql = {p ∈ IRd|δ(p, Q) ≤ EOQ}. We note that this search algorithm can also be modified to return the count or sum of the weights of the
points inside the approximate range rather than explicitly reporting the points.
In all of these search algorithms, the essential criteria behind the running time is the observation that a non-terminating node in the search, one that requires expansion of its child nodes, is a node that must cross certain size boundaries. For example, in the approximate range searching algorithm, the only nodes expanded are those whose region lies partially inside Q, else it would be discarded, and partially outside Ql, else it would be completely counted in the output size. A slightly more complex but similar argument applies for nearest and farthest neighbor algorithms. In the next section, we discuss a general theorem providing provable running time bounds for partition trees satisfying a fundamental packing argument.
Comments
Post a Comment