Data Structures for Databases:Overview of the Functionality of a Database Management System.

Overview of the Functionality of a Database Management System

Many of the previous chapters have shown that efficient strategies for complex data- structuring problems are essential in the design of fast algorithms for a variety of applications, including combinatorial optimization, information retrieval and Web search, databases and data mining, and geometric applications. The goal of this chapter is to provide the reader with an overview of the important data structures that are used in the implementation of a modern, general-purpose database management system (DBMS). In earlier chapters of the book the reader has already been exposed to many of the data structures employed in a DBMS context (e.g., B-trees, buffer trees, quad trees, R-trees, interval trees, hashing). Hence, we will focus mainly on their application but also introduce other important data structures to solve some of the fundamental data management problems such as query processing and optimization, efficient representation of data on disk, as well as the transfer of data from main memory to disk. Due to space constraints, we cannot cover applications of data structures to manage non-standard data such as multi-dimensional data, spatial and temporal data, multimedia data, or XML.

Before we begin our treatment of how data structures are used in a DBMS, we briefly review the basic architecture, its components, and their functionality. Unless otherwise noted, our discussion applies to a class of DBMSs that are based on the relational data model. These so-called relational database management systems make up the majority of systems in use today and are offered by all major vendors including IBM, Microsoft, Oracle, and Sybase. Most of the components described here can also be found in DBMSs based on other models such as the object-based model or XML.

Figure 60.1 depicts a conceptual overview of the main components that make up a DBMS. Rectangles represent system components, double-sided arrows represent input and output,

and the solid connectors indicate data as well as process flow between two components.

image

Please note that the inner workings of a DBMS are quite complex and we are not attempting to provide a detailed discussion of its implementation. For an in-depth treatment the reader should refer to one of the many excellent database textbooks books, e.g., [38].

Starting from the top, users interact with the DBMS via commands generated from a variety of user interfaces or application programs. These commands can either retrieve or update the data that is managed by the DBMS or create or update the underlying metadata that describes the schema of the data. The former are called queries, the latter are called data definition statements. Both types of commands are processed by the Query Evaluation Engine which contains components for parsing the input, producing an execution plan, and executing the plan against the underlying database. In the case of queries, the parsed command is presented to a query optimizer component, which uses information about how the data is stored to produce an efficient execution plan from the possibly many alternatives. We discuss data structures that represent parsed queries, execution plans, and statistics about a database, including the data structures that are used by an external sorting algorithm in Section 60.2 when we focus on the query evaluation engine.

Since databases are normally too large to fit into the main memory of a computer, the data of a database resides in secondary memory, generally on one or more magnetic disks. However, to execute queries or modifications on data, that data must first be transferred to main memory for processing and then back to disk for persistent storage. It is the job of the Storage Subsystem to accomplish a sophisticated placement of data on disk, to assure an efficient localization of these persistent data, to enable their bidirectional transfer between disk and main memory, and to allow direct access to these data from other DBMS subsystems. The storage subsystem consists of two components: The Disk Space Manager is responsible for storing physical data items on disk, managing free regions of the disk space, hiding device properties from higher architecture levels, mapping physical blocks to tracks and sectors of a disc, and controlling the transfer of data items between external and main memory. The Buffer Manager organizes an assigned, limited main memory area called buffer and may comprise several smaller buffers (buffer pool). Other subsystems may have direct access to data items in these buffers.

In Sections 60.3 and 60.4, we discuss data structures that are used to represent both data in memory as well as on disk such as fixed and variable-length records, large binary objects (LOBs), heap, sorted, and clustered files, as well as different types of index structures. Given the fact that a database management system must manage data that is both resident in main memory as well as on disk, one has to deal with the reality that the most appropriate data structure for data stored on disk is different from the data structures used for algorithms that run in main memory. Thus when implementing the storage manager, one has to pay careful attention to select not only the appropriate data structures but also to map the data between them in an efficient manner.

In addition to the above two subsystems, today’s modern DBMSs include a Transaction Management Subsystem to support concurrent execution of queries against the database and recovery from failure. Although transaction processing is an important and complex topic, it is less interesting for our investigation of data structures and is mentioned here only for completeness.

The rest of this chapter is organized as follows. Section 60.2 describes important data structures used during query evaluation.

Data structures used for buffer management are described in Section 60.3, and data structures used by the disk space manager are described in Section 60.4. Section 60.5 concludes the chapter.

Comments

Popular posts from this blog

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Data Structure Visualization:Introduction and Value of Data Structure Rendering

0/1 Knapsack Problem Memory function.