Basic Structures:Introduction and Arrays

Introduction

In this chapter, we review several basic structures that are usually taught in a ﬁrst class on data structures. There are several text books that cover this material, some of which are listed here [1–4]. However, we believe that it is valuable to review this material for the following reasons:

1. In practice, these structures are used more often than all of the other data structures discussed in this handbook combined.

2. These structures are used as basic building blocks on which other more complicated structures are based.

Our goal is to provide a crisp and brief review of these structures. For a more detailed explanation, the reader is referred to the text books listed at the end of this chapter. In the following, we assume that the reader is familiar with arrays and pointers.

Arrays

An array is conceptually deﬁned as a collection of <index,value> pairs. The implementation of the array in modern programming languages such as C++ and Java uses indices starting at 0. Languages such as Pascal permitted one to deﬁne an array on an arbitrary range of integer indices. In most applications, the array is the most convenient method to store a collection of objects. In these cases, the index associated with a value is unimportant. For example, if an array city is being used to store a list of cities in no particular order, it doesn’t really matter whether city[0] is “Denver” or “Mumbai”. If, on the other hand, an array name is being used to store a list of student names in alphabetical order, then, although the absolute index values don’t matter, the ordering of names associated with the ordering of the index does matter: i.e., name[i] must precede name[j] in alphabetical order, if i < j. Thus, one may distinguish between sorted arrays and unsorted arrays. Sometimes arrays are used so that the index does matter. For example, suppose we are trying to represent a histogram: we want to maintain a count of the number of students that got a certain score on an exam from a scale of 0 to 10. If score[5] = 7, this means that 7 students received a score of 5. In this situation, it is possible that the desired indices are not supported by the language. For example, C++ does not directly support using indices such as “blue”, “green”, and “red”. This may be rectiﬁed by using enumerated types to assign integer values to the indices. In cases when the objects in the array are large and unwieldy and have to be moved around from one array location to another, it may be advantageous to store pointers or references to objects in the array rather than the objects themselves.

Programming languages provide a mechanism to retrieve the value associated with a supplied index or to associate a value with an index. Programming languages like C++ do not explicitly maintain the size of the array. Rather, it is the programmer’s responsibility to be aware of the array size. Further, C++ does not provide automatic range-checking. Again, it is the programmer’s responsibility to ensure that the index being supplied is valid. Arrays are usually allocated contiguous storage in the memory. An array may be allocated statically (i.e., during compile-time) or dynamically (i.e., during program execution). For example, in C++, a static array is deﬁned as:

An important diﬀerence between static and dynamic arrays is that the size of a static array cannot be changed during run time, while that of a dynamic array can (as we will see in Section 2.2.3).

Operations on an Array

1. Retrieval of an element: Given an array index, retrieve the corresponding value.

This can be accomplished in O(1) time. This is an important advantage of the array relative to other structures. If the array is sorted, this enables one to compute the minimum, maximum, median (or in general, the ith smallest element) essentially for free in O(1) time.

2. Search: Given an element value, determine whether it is present in the array.

If the array is unsorted, there is no good alternative to a sequential search that iterates through all of the elements in the array and stops when the desired element is found:

In the worst case, this requires O(n) time. If, however, the array is sorted, binary search can be used.

Binary search only requires O(log n) time.

3. Insertion and Deletion: These operations can be the array’s Achilles heel. First, consider a sorted array. It is usually assumed that the array that results from an insertion or deletion is to be sorted. The worst case scenario presents itself when an element that is smaller than all of the elements currently in the array is to be inserted. This element will be placed in the leftmost location. However, to make room for it, all of the existing elements in the array will have to be shifted one place to the right. This requires O(n) time. Similarly, a deletion from the leftmost element leaves a “vacant” location. Actually, this location can never be vacant because it refers to a word in memory which must contain some value. Thus, if the program accesses a “vacant” location, it doesn’t have any way to know that the location is vacant. It may be possible to establish some sort of code based on our knowledge of the values contained in the array. For example, if it is known that an array contains only positive integers, then one could use a zero to denote a vacant location. Because of these and other complications, it is best to eliminate vacant locations that are interspersed among occupied locations by shifting elements to the left so that all vacant locations are placed to the right. In this case, we know which locations are vacant by maintaining an integer variable which contains the number of locations starting at the left that are currently in use. As before, this shifting requires O(n) time. In an unsorted array, the eﬃciency of insertion and deletion depends on where elements are to be added or removed. If it is known for example that insertion and deletion will only be performed at the right end of the array, then these operations take O(1) time as we will see later when we discuss stacks.

Sorted Arrays

We have already seen that there are several beneﬁts to using sorted arrays, namely: search- ing is faster, computing order statistics (the ith smallest element) is O(1), etc. This is the ﬁrst illustration of a key concept in data structures that will be seen several times in this handbook: the concept of preprocessing data to make subsequent queries eﬃcient. The idea is that we are often willing to invest some time at the beginning in setting up a data structure so that subsequent operations on it become faster. Some sorting algorithms such as heap sort and merge sort require O(n log n) time in the worst case, whereas other simpler sorting algorithms such as insertion sort, bubble sort and selection sort require O(n2) time in the worst case. Others such as quick sort have a worst case time of O(n2), but require O(n log n) on the average. Radix sort requires Θ(n) time for certain kinds of data. We refer the reader to [5] for a detailed discussion.

However, as we have seen earlier, insertion into and deletion from a sorted array can take Θ(n) time, which is large. It is possible to merge two sorted arrays into a single sorted array in time linear in the sum of their sizes. However, the usual implementation needs additional Θ(n) space. See [6] for an O(1)-space merge algorithm.

Array Doubling

To increase the length of a (dynamically allocated) one-dimensional array a that contains elements in positions a[0..n − 1], we ﬁrst deﬁne an array of the new length (say m), then copy the n elements from a to the new array, and ﬁnally change the value of a so that it references the new array. It takes Θ(m) time to create an array of length m because all elements of the newly created array are initialized to a default value. It then takes an additional Θ(n) time to copy elements from the source array to the destination array. Thus, the total time required for this operation is Θ(m + n). This operation is used in practice to increase the array size when it becomes full. The new array is usually twice the length of the original; i.e., m = 2n. The resulting complexity (Θ(n)) would normally be considered to be expensive. However, when this cost is amortized over the subsequent n insertions, it in fact only adds Θ(1) time per insertion. Since the cost of an insertion is Ω(1), this does not result in an asymptotic increase in insertion time. In general, increasing array size by a constant factor every time its size is to be increased does not adversely aﬀect the asymptotic complexity. A similar approach can be used to reduce the array size. Here, the array size would be reduced by a constant factor every time.

Multiple Lists in a Single Array

The array is wasteful of space when it is used to represent a list of objects that changes over time. In this scenario, we usually allocate a size greater than the number of objects that have to be stored, for example by using the array doubling idea discussed above. Consider a completely-ﬁlled array of length 8192 into which we are inserting an additional element. This insertion causes the array-doubling algorithm to create a new array of length 16,384 into which the 8192 elements are copied (and the new element inserted) before releasing the original array. This results in a space requirement during the operation which is almost three times the number of elements actually present. When several lists are to be stored, it is more eﬃcient to store them all in a single array rather than allocating one array for each list.

Although this representation of multiple lists is more space-eﬃcient, insertions can be more expensive because it may be necessary to move elements belonging to other lists in addition to elements in one’s own list. This representation is also harder to implement.

Heterogeneous Arrays

The deﬁnition of an array in modern programming languages requires all of its elements to be of the same type. How do we then address the scenario where we wish to use the array to store elements of diﬀerent types? In earlier languages like C, one could use the union facility to artiﬁcially coalesce the diﬀerent types into one type. We could then deﬁne an array on this new type. The kind of object that an array element actually contains is determined by a tag. The following deﬁnes a structure that contains one of three types of data.

The programmer would have to establish a convention on how the id tag is to be used: for example, that id = 0 means that the animal represented by the struct is actually a cat, etc. The union allocates memory for the largest type among Cat, Dog, and Fox. This is wasteful of memory if there is a great disparity among the sizes of the objects. With the advent of object-oriented languages, it is now possible to deﬁne the base type Animal. Cat, Dog, and Fox may be implemented using inheritance as derived types of Animal. An array of pointers to Animal can now be deﬁned. These pointers can be used to refer to any of Cat, Dog, and Fox.

Multidimensional Arrays

Row- or Column Major Representation

Earlier representations of multidimensional arrays mapped the location of each element of the multidimensional array into a location of a one- dimensional array. Consider a two- dimensional array with r rows and c columns. The number of elements in the array n = rc. The element in location [i][j], 0 ≤ i < r and 0 ≤ j < c, will be mapped onto an integer in the range [0,n − 1]. If this is done in row-major order — the elements of row 0 are listed in order from left to right followed by the elements of row 1, then row 2, etc. — the mapping function is ic + j. If elements are listed in column-major order, the mapping function is jr + i. Observe that we are required to perform a multiplication and an addition to compute the location of an element in an array.

Array of Arrays Representation

In Java, a two-dimensional array is represented as a one-dimensional array in which each element is, itself, a one-dimensional array. The array

int [][] x = new int[4][5];

is actually a one-dimensional array whose length is 4. Each element of x is a one-dimensional array whose length is 5. Figure 2.2 shows an example. This representation can also be used in C++ by deﬁning an array of pointers. Each pointer can then be used to point to a

dynamically-created one-dimensional array. The element x[i][j] is found by ﬁrst retrieving the pointer x[i]. This gives the address in memory of x[i][0]. Then x[i][j] refers to the element j in row i. Observe that this only requires the addition operator to locate an element in a one-dimensional array.

Irregular Arrays

A two-dimensional array is regular in that every row has the same number of elements. When two or more rows of an array have diﬀerent number of elements, we call the array irregular. Irregular arrays may also be created and used using the array of arrays represen- tation.

Sparse Matrices

A matrix is sparse if a large number of its elements are 0. Rather than store such a matrix as a two-dimensional array with lots of zeroes, a common strategy is to save space by explicitly storing only the non-zero elements. This topic is of interest to the scientiﬁc computing community because of the large sizes of some of the sparse matrices it has to deal with. The speciﬁc approach used to store matrices depends on the nature of sparsity of the matrix. Some matrices, such as the tridiagonal matrix have a well-deﬁned sparsity pattern. The tridiagonal matrix is one where all of the nonzero elements lie on one of three diagonals: the main diagonal and the diagonals above and below it. See Figure 2.3(a).

There are several ways to represent this matrix as a one-dimensional array. We could order elements by rows giving [2,1,1,3,4,1,1,2,4,7,4,3,5] or by diagonals giving [1,1,4,3,2,3,1,7,5,1,4,2,4].

Figure 2.3 shows other special matrices: the upper and lower triangular matrices which can also be represented using a one-dimensional representation.

Other sparse matrices may have an irregular or unstructured pattern. Consider the matrix in Figure 2.4(a). We show two representations. Figure 2.4(b) shows a one-dimensional array of triples, where each triple represents a nonzero element and consists of the row, column, and value. Figure 2.4(c) shows an irregular array representation. Each row is represented by a one-dimensional array of pairs, where each pair contains the column number and the corresponding nonzero value.

Search This Blog

algorithms