Geometric and Spatial Data Structures in External Memory:Dynamic and Kinetic Data Structures.
Dynamic and Kinetic Data Structures
In this section we consider two scenarios where data items change: dynamic (in which items are inserted and deleted), and kinetic (in which the data items move continuously along specified trajectories). In both cases, queries can be done at any time. It is often useful for kinetic data structures to allow insertions and deletions; for example, if the trajectory of an item changes, we must delete the old trajectory and insert the new one.
Logarithmic Method for Decomposable Search Problems
In previous sections, we’ve already encountered several dynamic data structures for the problems of dictionary lookup and range search. In Section 27.4 we saw how to develop optimal EM range search data structures by externalizing some known internal memory data structures. The key idea was to use the bootstrapping paradigm, together with weight- balanced B-trees as the underlying data structure, in order to consolidate several static data structures for small instances of range searching into one dynamic data structure for the full problem. The bootstrapping technique is specific to the particular data structures involved. In this section we look at another technique that is based upon the properties of the problem itself rather than upon that of the data structure.
We call a problem decomposable if we can answer a query by querying individual subsets of the problem data and then computing the final result from the solutions to each subset.
Dictionary search and range searching are obvious examples of decomposable problems.
Bentley developed the logarithmic method [29, 96] to convert efficient static data structures for decomposable problems into general dynamic ones. In the internal memory setting, the logarithmic method consists of maintaining a series of static substructures, at most one each of size 1, 2, 4, 8,. . . . When a new item is inserted, it is initialized in a substructure of size 1. If a substructure of size 1 already exists, the two substructures are combined into a single substructure of size 2. If there is already a substructure of size 2, they in turn are combined into a single substructure of size 4, and so on. For the current value of N , it is easy to see that the kth substructure (i.e., of size 2k) is present exactly when the kth bit in the binary representation of N is 1. Since there are at most log N substructures, the search time bound is log N times the search time per substructure. As the number of items increases from 1 to N , the kth structure is built a total of N/2k times (assuming N is a power of 2). If it can be built in O(2k ) time, the total time for all insertions and all substructures is thus O(N log N ), making the amortized insertion time O(log N ). If we use up to three substructures of size 2k at a time, we can do the reconstructions in advance and convert the amortized update bounds to worst-case [96].
In the EM setting, in order to eliminate the dependence upon the binary logarithm in the I/O bounds, the number of substructures must be reduced from log N to logB N , and thus the maximum size of the kth substructure must be increased from 2k to Bk . As the number of items increases from 1 to N , the kth substructure has to be built NB/Bk times (when N is a power of B), each time taking O(Bk (logB N )/B) I/Os. The key point is that the extra factor of B in the numerator of the first term is canceled by the factor of B in the denominator of the second term, and thus the resulting total insertion time over all N insertions and all logB N structures is O(N (logB N )2) I/Os, which is O((logB N )2) I/Os amortized per insertion. By global rebuilding we can do deletions in O(logB N ) I/Os amortized. As in the internal memory case, the amortized updates can typically be made worst-case.
Arge and Vahrenhold [16] obtain I/O bounds for dynamic point location in general planar subdivisions similar to those of [2], but without use of level-balanced trees. Their method uses a weight-balanced base structure at the outer level and a multislab structure for storing segments similar to that of Arge and Vitter [17] described in Section 27.4.3. They use the logarithmic method to construct a data structure to answer vertical rayshooting queries in the multislab structures. The method relies upon a total ordering of the segments, but such an ordering can be changed drastically by a single insertion. However, each substructure in the logarithmic method is static (until it is combined with another substructure), and thus a static total ordering can be used for each substructure. The authors show by a type of fractional cascading that the queries in the logB N substructures do not have to be done independently, which saves a factor of logB N in the I/O cost, but at the cost of making the data structures amortized instead of worst-case.
Agarwal et al. [6] apply the logarithmic method (in both the binary form and B-way variant) to get EM versions of kd-trees, BBD trees, and BAR trees.
Early work on temporal data generally concentrated on time-series or multiversion data [105]. A question of growing interest in this mobile age is how to store and index continuously moving items, such as mobile telephones, cars, and airplanes, using kinetic data structures (e.g., see [69, 104, 128]) . There are two main approaches to storing moving items: The first technique is to use the same sort of data structure as for nonmoving data, but to update it whenever items move sufficiently far so as to trigger important combinatorial events that are relevant to the application at hand [24]. For example, an event relevant for range search might be triggered when two items move to the same horizontal displacement (which hap- pens when the x-ordering of the two items is about to switch). A different approach is to store each item’s location and speed trajectory, so that no updating is needed as long as the item’s trajectory plan does not change. Such an approach requires fewer updates, but the representation for each item generally has higher dimension, and the search strategies are therefore less efficient.
Kollios et al. [78] developed a linear-space indexing scheme for moving points along a (one-dimensional) line, based upon the notion of partition trees. Their structure supports a variety of range search and approximate nearest neighbor queries. For example, given a range and time, the points in that range at the indicated time can be retrieved in O(n1/2+E +k) I/Os, for arbitrarily small E > 0. Updates require O((log n)2) I/Os. Agarwal et al. [3]
extend the approach to handle range searches in two dimensions, and they improve the update bound to O((logB n)2) I/Os. They also propose an event-driven data structure with the same query times as the range search data structure of Arge and Vitter [15] discussed in Section 27.4.5, but with the potential need to do many updates. A hybrid data structure combining the two approaches permits a tradeoff between query performance and update frequency.
R-trees offer a practical generic mechanism for storing multidimensional points and are thus a natural alternative for storing mobile items. One approach is to represent time as a separate dimension and to cluster trajectories using the R-tree heuristics. However, the orthogonal nature of the R-tree does not lend itself well to diagonal trajectories. For the case of points moving along linear trajectories, Sˇaltenis et al. [104] build the R-tree upon only the spatial dimensions, but parameterize the bounding box coordinates to account for the movement of the items stored within. They maintain an outer approximation of the true bounding box, which they periodically update to refine the approximation. Agarwal and Har-Peled [9] show how to maintain a provably good approximation of the minimum bounding box with need for only a constant number of refinement events. Further discussion of kinetic data structures, primarily for internal memory, appears in Chapter 23.
Comments
Post a Comment