Approximate Geometric Query Structures:Introduction and General Terminology

Introduction

Specialized data structures are useful for answering specific kinds of geometric queries. Such structures are tailor-made for the kinds of queries that are anticipated and even then there are cases when producing an exact answer is only slightly better than an exhaustive search. For example, Chazelle and Welzl [7] showed that triangle range queries can be solved in O(n log n) time using linear space but this holds only in the plane. In higher dimensions, the running times go up dramatically, so that, in general, the time needed to perform an exact simplex range query and still use small linear space is roughly Ω(n11/d), ignoring logarithmic factors [6]. For orthogonal range queries, efficient query processing is possible if superlinear space is allowed. For example, range trees (Chapter 18) can answer orthogonal range queries in O(logd1 n) time but use O(n logd1 n) space [17].

In this chapter, we focus instead on general-purpose data structures that can answer nearest-neighbor queries and range queries using linear space. Since the lower-bound of Chazelle [6] applies in this context, in order to get query bounds that are significantly faster than exhaustive search, we need to compromise somewhat on the exactness of our answers. That is, we will answer all queries approximately, giving responses that are within an arbitrarily small constant factor of the exact solution. As we discuss, such responses can typically be produced in logarithmic or polylogarithmic time, using linear space. Moreover, in many practical situations, a good approximate solution is often sufficient.

In recent years several interesting data structures have emerged that efficiently solve several general kinds of geometric queries approximately. We review three major classes of such structures in this chapter. The first one we discuss is a structure introduced by Arya et al. [1] for efficiently approximating nearest-neighbor queries in low-dimensional space. Their work developed a new structure known as the balanced box decomposition (BBD) tree. The BBD tree is a variant of the quadtree and octree [14] but is most closely related to the fair-split tree of Callahan and Kosaraju [5]. In [3], Arya and Mount extend the structure to show that it can also answer approximate range queries. Their structure is based on the decomposition of space into “boxes” that may have a smaller box “cut out;” hence, the boxes may not be convex. The second general purpose data structure we discuss is the balanced aspect ratio (BAR) tree of Duncan et al. [11–13], which is a structure that has similar performance as the BBD tree but decomposes space into convex regions. Finally, we discuss an analysis of a type of k-d tree [16] that helps to explain why k-d trees have long been known to exhibit excellent performance bounds in practice for general geometric queries. In particular, we review a result of Dickerson et al. [9, 11], which shows that one of the more common variants, the maximum-spread k-d tree, exhibits properties similar to BBD trees and BAR trees; we present efficient bounds on approximate geometric queries for this variant. Unfortunately, the bounds are not as efficient as the BBD tree or BAR tree but are comparable.

General Terminology

In order to discuss approximate geometric queries and the efficient structures on them without confusion, we must cover a few fundamental terms. We distinguish between general points in IRd and points given as input to the structures.

For a given metric space IRd, the coordinates of any point p IRd are (p1, p2,..., pd).

When necessary to avoid confusion, we refer to points given as input in a set S as data points and general points in IRd as real points. For two points p, q IRd, the Lm metric

distance between p and q is

image

Although our analysis will concentrate on the Euclidean L2 metric space, the data structures mentioned in this chapter work in all of the Lm metric spaces.

In addition, we use the standard notions of (convex) regions R, rectangular boxes, hyper- planes H, and hyperspheres B. For each of these objects we define two distance values. Let P and Q be any two regions in IRd, the minimum and maximum metric distances between P and Q are

image

Notice that this definition holds even if one or both regions are simply points.

Let S be a finite data set S IRd. For a subset S1 S, the size of S1, written |S1|, is the number of distinct data points in S1. More importantly, for any region R IRd, the size is |R| = |R S|. That is, the size of a region identifies the number of data points in it. To refer to the physical size of a region, we define the outer radius as OR = minRBr r, where Br is defined as the hypersphere with radius r. The inner radius of a region is IR = maxBr R r.

The outer radius, therefore, identifies the smallest ball that contains the region R whereas the inner radius identifies the largest ball contained in R.

In order to discuss balanced aspect ratio, we need to define the term.

DEFINITION 26.1 A convex region R in IRd has aspect ratio asp(R) = OR/IR with respect to some underlying metric. For a given balancing factor α, if asp(R) α, R has balanced aspect ratio and is called an α-balanced region. Similarly, a collection of regions R has balanced aspect ratio for a given factor α if each region R R is an α-balanced region.

For simplicity, when referring to rectangular boxes, we consider the aspect ratio as simply the ratio of the longest side to the shortest side. It is fairly easy to verify that the two definitions are equal within a constant factor. As is commonly used, we refer to regions as being either fat or skinny depending on whether their aspect ratios are balanced or not.

The class of structures that we discuss in this chapter are all derivatives of binary space partition (BSP) trees, see for example [18].

See also Chapter 20.

Each node u in a BSP tree T represents both a region Ru in space and the data subset Su S of objects, points, lying inside Ru. For simplicity, regions are considered closed and points falling on the boundary of two regions can be in either of the two regions but not both. Each leaf node in T represents a region with a constant number of data objects, points, from S. Each internal node in T has an associated cut partitioning the region into two subregions, each a child node. The root of T is associated with some bounding (rectangular box) region containing S. In general, BSP trees can store any type of object, points, lines, solids, but in our case we focus on points. Typically, the partitioning cuts used are hyperplanes resulting in convex regions. However, the BBD tree presented in Section 26.5 is slightly different and can introduce regions with a single interior hole. Therefore, we have generalized slightly to accommodate this in our definition.

Comments

Popular posts from this blog

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Data Structure Visualization:Introduction and Value of Data Structure Rendering

0/1 Knapsack Problem Memory function.