Data Structures in JDSL:The Architecture of JDSL
The Architecture of JDSL
In this section we describe the interfaces of the data structures present in JDSL, the classes that implement those interfaces, and the algorithms that operate on them. Most containers are described by two interfaces, one (whose name is prefixed with Inspectable) that comprise all the methods to query the container, and the other, extending the first, that comprise all the methods to modify the container. Inspectable interfaces can be used as variable or argument types in order to obtain an immutable view of a container (for instance, to prevent an algorithm from modifying the container it operates on).
As described in Section 43.2.1, we can partition the set of containers present in JDSL into two subsets: the positional containers and the key-based containers. Accordingly, the interfaces for the various containers are organized into two hierarchies (see Figures 43.4
and 43.5), with a common root given by interfaces InspectableContainer and Container. At the same time, container interfaces, their implementations, and algorithms that operate on them are grouped into various Java packages.
In the rest of this section, we denote with n the current number of elements stored in the container being considered.
JDSL currently consists of eight Java packages, each containing a set of interfaces and/or classes. Interfaces and exceptions for the data structures are defined in packages with the api suffix, while the reference implementations of these interfaces are defined in packages with the ref suffix. Interfaces, classes, and exceptions for the algorithms are instead grouped on a functional basis. As we will see later, the interfaces are arranged in hierarchies that may extend across different packages. The current packages are the following:
jdsl.core.api Interfaces and exceptions that compose the API for the core containers (sequences, trees, priority queues, and dictionaries), for their accessors and comparators, and for the iterators on their elements, positions and locators.
jdsl.core.ref Implementations of the interfaces in jdsl.core.api. Most implementations have names of the form ImplementationStyleInterfaceName. For instance, Array- Sequence and NodeSequence implement the jdsl.core.api.Sequence interface with a growable array and with a linked structure, respectively. Classes with names of the form AbstractInterfaceName implement some methods of the interface for the convenience of developers building alternative implementations.
jdsl.core.algo.sorts Sorting algorithms that operate on the elements stored in a jdsl.core. api.Sequence object. They are parameterized with respect to the comparison rule used to sort the elements, provided as a jdsl.core.api.Comparator object.
jdsl.core.algo.traversals Traversal algorithms that operate on jdsl.core.api.InspectableTree objects. A traversal algorithm performs operations while visiting the nodes of the tree, and can be extended by applying the template method pattern.
jdsl.core.util This package contains a Converter class to convert some JDSL containers to the equivalent data structures of the Java Collections and vice versa.
jdsl.graph.api Interfaces and exceptions that compose the API for the graph container and for the iterators on its vertices and edges.
jdsl.graph.ref Implementations of the interfaces in jdsl.graph.api; in particular, class IncidenceListGraph is an implementation of interface jdsl.graph.api.Graph.
jdsl.graph.algo Basic graph algorithms, including depth-first search, topological numbering, shortest path, and minimum spanning tree, all of which can be extended by applying the template method pattern.
Positional Containers
All positional containers implement interfaces InspectablePositionalContainer and Positional- Container, which extend InspectableContainer and Container, respectively (see Figure 43.4).
Every positional container implements a set of essential operations, including being able to determine its own size (size()), to determine whether it contains a specific position (contains(Accessor)), to replace the element associated with a position (replaceElement(Accessor,Object)), to swap the elements associated with two positions (swapElements(Position, Position)), and to get iterators over the positions (positions()) or the elements (elements()) of the container.
Sequences
A sequence is a basic data structure used for storing elements in a linear, ranked fashion (see Chapter 2). Sequences can be implemented in many ways, e.g., as a linked list of nodes or on top of an array. In JDSL, sequences are described by interfaces InspectableSequence and Sequence, which extend InspectablePositionalContainer and PositionalContainer, respectively.
In addition to the basic methods common to all positional containers, the sequence interfaces provide methods to access and modify positions at the sequence ends (methods such as first(), insertLast() and removeFirst()) or specific positions along the sequence (methods such as after(Position), atRank(int), insertBefore(Position) and removeAtRank(int)).
NodeSequence is an implementation of Sequence based on a doubly linked list of nodes.
The nodes are the positions of the sequence. It takes O(1) time to insert, remove, or access both ends of the sequence or a position before or after a given one, while it takes O(n) time to insert, remove, or access positions at a given rank in the sequence. Thus, NodeSequence instances can be suitably used as stacks, queues, or deques.
ArraySequence is an implementation of Sequence based on a growable array of positions.
Instances can be created with an initial capacity, and can be told whether or not to reduce this capacity when their size drops below a certain value, depending on whether the user prefers space or time efficiency. It takes O(1) time to access any position in the sequence, O(1) amortized time over a series of operations to insert or remove elements at the end of the sequence, and O(n) time to insert or remove elements at the beginning or middle of the sequence. Hence, ArraySequence instances can be suitably used for quick access to the elements after their initial insertion, when filled only at the end, or as stacks.
Trees
Trees allow more sophisticated relationships between elements than is possible with a sequence: they allow relationships between a child and its parent, or between siblings of a parent (see Chapter 3).
InspectableTree and Tree are the interfaces describing a general tree; they extend InspectablePositionalContainer and PositionalContainer, respectively.
InspectableBinaryTree, which extends InspectableTree, and BinaryTree, which extends PositionalContainer, are the interfaces describing a binary tree. In addition to the basic methods common to all positional containers, the tree interfaces provide methods to determine where in the tree a position lies (methods such as isRoot(Position) and isExternal(Position)), to re- turn the parent (parent(Position)), siblings (siblings(Position)) or children (methods such as children(Position), childAtRank(Position,int) and leftChild(Position)) of a position, and to cut (cut(Position)) or link (link(Position,Tree)) a subtree.
NodeTree is an implementation of Tree based on a linked structure of nodes. The nodes are the positions of the tree. It is the implementation to use when a generic tree is needed or for building more specialized (nonbinary) trees. NodeTree instances always contain at least one node.
NodeBinaryTree is an implementation of BinaryTree based on a linked structure of nodes.
The nodes are the positions of the tree. Similarly to NodeTree, NodeBinaryTree instances always contain at least one node; in addition, each node can have either zero or two children. If a more complex tree is not necessary, using NodeBinaryTree instances will be faster and easier than using NodeTree ones.
Graphs
A graph is a fundamental data structure describing a binary relationship on a set of elements (see Chapter 4) and it is used in a variety of application areas.
Each vertex of the graph may be linked to other vertices through edges. Edges can be either one-way, directed edges, or two-way, undirected edges. In JDSL, both vertices and edges are positions of the graph. JDSL handles all graph special cases such as self-loops, multiple edges between two vertices, and disconnected graphs.
The main graph interfaces are InspectableGraph, which extends InspectablePositionalCon- tainer, ModifiableGraph, which extends PositionalContainer, and Graph, which extends both InspectableGraph and ModifiableGraph. These interfaces provide methods to determine whether two vertices are adjacent (areAdjacent(Vertex,Vertex)) or whether a vertex and an edge are incident (areIncident(Vertex,Edge)), to determine the degree of a vertex (degree(Ver- tex)), to determine the origin (origin(Edge)) or destination (destination(Edge)) of an edge, to insert (insertVertex(Object)) or remove (removeVertex(Vertex)) a vertex, to set the di- rection of an edge (setDirectionFrom(Edge,Vertex) and setDirectionTo(Edge,Vertex)), to in- sert (insertEdge(Vertex,Vertex,Object)), remove (removeEdge(Edge)), split (splitEdge(Edge, Object)), or unsplit (unsplitEdge(Vertex,Object)) an edge.
IncidenceListGraph is an implementation of Graph. As its name suggests, it is based on an incidence list representation of a graph.
All key-based containers implement interfaces Inspec table Key Based Container and Key Based- Container, which extend Inspectable Container and Container, respectively (see Figure 43.5).
Every key-based container implements a set of essential operations, including being able to determine its own size (size()), to determine whether it contains a specific locator
(contains(Accessor)), to replace the key (replaceKey(Locator,Object)) or the element (replace- Element(Accessor,Object)) associated with a locator, to insert (insert(Object,Object)) or re- move (remove(Locator)) a key/element pair, and to get iterators over the locators (locators()), the keys (keys()) or the elements (elements()) of the container.
Priority queues
A priority queue is a data structure used for storing a collection of elements prioritized by keys, where the smallest key value indicates the highest priority (see Part II).
It supports arbitrary insertions and deletions of elements and keeps track of the highest-priority key. A priority queue is useful, for instance, in applications where the user wishes to store a queue of tasks of varying priority, and always process the most important task.
Interface Priority Queue extends Key Based Container. In addition to the basic methods common to all the key-based containers, it provides methods to access (min()) or remove (remove Min()) the key/element pair with highest priority, i.e., with minimum key. Note that the priority of an element can be changed using method replace Key(Locator,Object), inherited from Key Based Container.
Array Heap is an efficient implementation of Priority Queue based on a heap. Inserting,removing, or changing the key of a key/element pair takes O(log n) time, while examining the key/element pair with the minimum key takes O(1) time. The implementation is parameterized with respect to the comparison rule used to order the keys; to this purpose, a Comparator object is passed as an argument to the Array Heap constructors.
Dictionaries
A dictionary is a data structure used to store key/element pairs and then quickly search for them using their keys (see Part III). An ordered dictionary is a particular dictionary where a total order on the set of keys is defined. All JDSL dictionaries are multi-maps, i.e., they can store multiple key/element pairs with the same key.
The main dictionary interfaces are InspectableDictionary and Dictionary, which extend InspectableKeyBasedContainer and KeyBasedContainer, respectively. In addition to the basic methods common to all the key-based containers, these interfaces provide methods to find key/element pairs by their keys (find(Object) and findAll(Object)) and to remove all key/element pairs with a specific key (removeAll(Object)). Other dictionary interfaces are InspectableOrderedDictionary and OrderedDictionary, which extend InspectableDictionary and Dictionary, respectively. They provide additional methods to access the first (first()) or last (last()) key/element pair in the ordered dictionary, and to access the key/element pair before (before(Locator)) or after (after(Locator)) a given key/element pair.
HashtableDictionary is an implementation of Dictionary. As its name suggests, it is based on a hash table. Insertions and removals of key/element pairs usually take O(1) time, although individual insertions and removals may require O(n) time. The implementation is parameterized with respect to the hashing function used to store the key/element pairs; to this purpose, a HashComparator object is passed as an argument to the HashtableDictionary constructors. HashtableDictionary is a good choice when overall speed is necessary.
RedBlackTree is an implementation of OrderedDictionary. It is a particular type of binary search tree, where insertion, removal, and access to key/element pairs require each O(log n) time. The implementation is parameterized with respect to the comparison rule used to order the keys; to this purpose, a Comparator object is passed as an argument to the RedBlackTree constructors.
Algorithms
In addition to the data structures described above, JDSL includes various algorithms that operate on them.
Sequence sorting
JDSL provides a suite of sorting algorithms for different applications. They all implement the Sort Object interface, whose only method is sort(Sequence,Comparator). Sorting algorithms with the prefix List are most efficient when used on instances of Node Sequence while those with the prefix Array are most efficient when used on instances of Array Sequence.
Array Quick Sort is an implementation of the quicksort algorithm. This algorithm runs in O(n log n) expected time and performs very well in practice. Its performance, however, degrades greatly if the sequence is already very close to being sorted. Also, it is not stable, i.e., it does not guarantee that elements with the same value will remain in the same order they were in before sorting. In all cases whether neither of these caveats apply, it is the best choice.
ListMergeSort and ArrayMergeSort are two implementations of the mergesort algorithm.
This algorithm is not as fast as quicksort in practice, even though its theoretical time complexity is O(n log n). There are no cases where its performance will degrade due to peculiarities in the input data, and it is a stable sort.
HeapSort is an implementation of the heapsort algorithm, and uses an instance of Array-Heap (see Section 43.3.3) as a sorting device.
Its performance, like that of mergesort, will not degrade due to peculiarities in the input data, but it is not a stable sort. Its theoretical time complexity is O(n log n).
Iterator-based tree traversals
JDSL provides two types of tree traversals. The first type is based on iterators: the tree is passed as an argument to the iterator constructor and is then iterated over using methods hasNext() and nextPosition(). Iterators give a quick traversal of the tree in a specific order, and are the proper traversals to use when this is all that is required. We recall that iterators in JDSL have snapshots semantics (see Section 43.2.2).
A preorder iterator visits the nodes of the tree in preorder, i.e., it returns a node before returning any of its children. Preorder iterators work for both binary and general trees;
they are implemented in class PreOrderIterator.
A postorder iterator visits the nodes of the tree in postorder, i.e., it returns a node after returning all of its children. Postorder iterators work for both binary and general trees;
they are implemented in class Post OrderIterator.
An inorder iterator visits the nodes of the tree in inorder, i.e., it returns a node in between its left and right children. Inorder iterators work only for binary trees; they are implemented in class InOrderIterator.
Euler tour tree traversal
The second type of tree traversals in JDSL is named Euler tour: it is implemented — in class Euler Tour — as an algorithm object, which can be extended by applying the template method pattern.
The Euler tour visits each node of the tree several times, namely, a first time before traversing any of the subtrees of the node, then between the traversals of any two consecutive subtrees, and a last time after traversing all the subtrees. Each time a node is visited, one of the methods visitFirstTime(Position), visitBetweenChildren(Position) and visit- LastTime(Position), if the node is internal, or method visitExternal(Position), if the node is a leaf, is automatically called. A particular computation on the visited tree may be performed by suitably overriding those methods in a subclass of EulerTour.
The Euler tour is more powerful than the iterators described above as it can be used to implement more general kinds of algorithms. Note that, unlike the iterators, the Euler tour does not have snapshot semantics; this means that any modification of the tree during the execution of the Euler tour will cause undefined behavior.
Graph traversals
The depth-first search (DFS) traversal of a graph is available in JDSL. Depth-first search proceeds by visiting an unvisited vertex adjacent to the current one; if no such vertex exists, then the algorithm backtracks to the previous visited vertex.
Similarly to the Euler tour, depth-first search is implemented in JDSL as an algorithm object, which can be extended by applying the template method pattern. The basic implementation of depth-first search — DFS — is designed to work on undirected graphs. The user can specify actions to occur when a vertex is first or last visited or when different sorts of edges (such as “tree” edges of the DFS tree or “back” edges to previously visited vertices) are traversed by subclassing DFS and suitably overriding some methods.
DFS has two subclasses: Find CycleDFS, an algorithm for determining cycles in an undirected graph, and DirectedDFS, a depth-first search specialized for directed graphs. In turn, Directed DFS has one subclass: Directed Find Cycle DFS, an algorithm for determining cycles in a directed graph. These subclasses are examples of how to apply the template method pattern to DFS in order to implement a more specific algorithm.
Topological numbering
A topological numbering is a numbering of the vertices of a directed acyclic graph such that, if there is an edge from vertex u to vertex v, then the number associated with v is higher than the number associated with u.
Two algorithms that compute a topological numbering are included in JDSL: Topological-Sort, which decorates each vertex with a unique number, and UnitWeightedTopologicalNum bering, which decorates each vertex with a nonnecessarily unique number based on how far the vertex is from the source of the graph. Both topological numbering algorithms extend abstract class AbstractTopologicalSort.
Dijkstra’s algorithm
Dijkstra’s algorithm computes the shortest path from a specific vertex to every other vertex of a weighted connected graph. The JDSL implementation of Dijkstra’s algorithm — IntegerDijkstraTemplate — follows the template method pattern and can be easily extended to change its functionality. Extending it makes it possible, for instance, to set the function for calculating the weight of an edge, to change the way the results are stored, or to stop the execution of the algorithm after computing the shortest path to a specific vertex (as done in subclass IntegerDijkstraPathfinder). An example of using Dijkstra’s algorithm is given in Section 43.4.
The Prim-Jarn´ık algorithm
The Prim-Jarn´ık algorithm computes a minimum spanning tree of a weighted connected graph, i.e., a tree that contains all the vertices of the graph and has the minimum total weight over all such trees. The JDSL implementation of the Prim-Jarn´ık algorithm —
IntegerPrimTemplate — follows the template method pattern and can be easily extended to change its functionality. Extending it makes it possible, for instance, to set the function for calculating the weight of an edge, to change the way the results are stored, or to stop the execution of the algorithm after computing the minimum spanning tree for a limited set of vertices.
Comments
Post a Comment