Approximate Geometric Query Structures:Introduction and General Terminology

Introduction

Specialized data structures are useful for answering speciﬁc kinds of geometric queries. Such structures are tailor-made for the kinds of queries that are anticipated and even then there are cases when producing an exact answer is only slightly better than an exhaustive search. For example, Chazelle and Welzl [7] showed that triangle range queries can be solved in O(√n log n) time using linear space but this holds only in the plane. In higher dimensions, the running times go up dramatically, so that, in general, the time needed to perform an exact simplex range query and still use small linear space is roughly Ω(n1−1/d), ignoring logarithmic factors [6]. For orthogonal range queries, eﬃcient query processing is possible if superlinear space is allowed. For example, range trees (Chapter 18) can answer orthogonal range queries in O(logd−1 n) time but use O(n logd−1 n) space [17].

In this chapter, we focus instead on general-purpose data structures that can answer nearest-neighbor queries and range queries using linear space. Since the lower-bound of Chazelle [6] applies in this context, in order to get query bounds that are signiﬁcantly faster than exhaustive search, we need to compromise somewhat on the exactness of our answers. That is, we will answer all queries approximately, giving responses that are within an arbitrarily small constant factor of the exact solution. As we discuss, such responses can typically be produced in logarithmic or polylogarithmic time, using linear space. Moreover, in many practical situations, a good approximate solution is often suﬃcient.

In recent years several interesting data structures have emerged that eﬃciently solve several general kinds of geometric queries approximately. We review three major classes of such structures in this chapter. The ﬁrst one we discuss is a structure introduced by Arya et al. [1] for eﬃciently approximating nearest-neighbor queries in low-dimensional space. Their work developed a new structure known as the balanced box decomposition (BBD) tree. The BBD tree is a variant of the quadtree and octree [14] but is most closely related to the fair-split tree of Callahan and Kosaraju [5]. In [3], Arya and Mount extend the structure to show that it can also answer approximate range queries. Their structure is based on the decomposition of space into “boxes” that may have a smaller box “cut out;” hence, the boxes may not be convex. The second general purpose data structure we discuss is the balanced aspect ratio (BAR) tree of Duncan et al. [11–13], which is a structure that has similar performance as the BBD tree but decomposes space into convex regions. Finally, we discuss an analysis of a type of k-d tree [16] that helps to explain why k-d trees have long been known to exhibit excellent performance bounds in practice for general geometric queries. In particular, we review a result of Dickerson et al. [9, 11], which shows that one of the more common variants, the maximum-spread k-d tree, exhibits properties similar to BBD trees and BAR trees; we present eﬃcient bounds on approximate geometric queries for this variant. Unfortunately, the bounds are not as eﬃcient as the BBD tree or BAR tree but are comparable.

General Terminology

In order to discuss approximate geometric queries and the eﬃcient structures on them without confusion, we must cover a few fundamental terms. We distinguish between general points in IRd and points given as input to the structures.

For a given metric space IRd, the coordinates of any point p ∈ IRd are (p1, p2,..., pd).

When necessary to avoid confusion, we refer to points given as input in a set S as data points and general points in IRd as real points. For two points p, q ∈ IRd, the Lm metric

distance between p and q is

Although our analysis will concentrate on the Euclidean L2 metric space, the data structures mentioned in this chapter work in all of the Lm metric spaces.

In addition, we use the standard notions of (convex) regions R, rectangular boxes, hyper- planes H, and hyperspheres B. For each of these objects we deﬁne two distance values. Let P and Q be any two regions in IRd, the minimum and maximum metric distances between P and Q are

Notice that this deﬁnition holds even if one or both regions are simply points.

Let S be a ﬁnite data set S ⊂ IRd. For a subset S1 ⊆ S, the size of S1, written |S1|, is the number of distinct data points in S1. More importantly, for any region R ⊂ IRd, the size is |R| = |R ∩ S|. That is, the size of a region identiﬁes the number of data points in it. To refer to the physical size of a region, we deﬁne the outer radius as OR = minR⊆Br r, where Br is deﬁned as the hypersphere with radius r. The inner radius of a region is IR = maxBr ⊆R r.

The outer radius, therefore, identiﬁes the smallest ball that contains the region R whereas the inner radius identiﬁes the largest ball contained in R.

In order to discuss balanced aspect ratio, we need to deﬁne the term.

DEFINITION 26.1 A convex region R in IRd has aspect ratio asp(R) = OR/IR with respect to some underlying metric. For a given balancing factor α, if asp(R) ≤ α, R has balanced aspect ratio and is called an α-balanced region. Similarly, a collection of regions R has balanced aspect ratio for a given factor α if each region R ∈R is an α-balanced region.

For simplicity, when referring to rectangular boxes, we consider the aspect ratio as simply the ratio of the longest side to the shortest side. It is fairly easy to verify that the two deﬁnitions are equal within a constant factor. As is commonly used, we refer to regions as being either fat or skinny depending on whether their aspect ratios are balanced or not.

The class of structures that we discuss in this chapter are all derivatives of binary space partition (BSP) trees, see for example [18].

Search This Blog

algorithms