Data Structures for Databases:Overview of the Functionality of a Database Management System.

Overview of the Functionality of a Database Management System

Many of the previous chapters have shown that eﬃcient strategies for complex data- structuring problems are essential in the design of fast algorithms for a variety of applications, including combinatorial optimization, information retrieval and Web search, databases and data mining, and geometric applications. The goal of this chapter is to provide the reader with an overview of the important data structures that are used in the implementation of a modern, general-purpose database management system (DBMS). In earlier chapters of the book the reader has already been exposed to many of the data structures employed in a DBMS context (e.g., B-trees, buﬀer trees, quad trees, R-trees, interval trees, hashing). Hence, we will focus mainly on their application but also introduce other important data structures to solve some of the fundamental data management problems such as query processing and optimization, eﬃcient representation of data on disk, as well as the transfer of data from main memory to disk. Due to space constraints, we cannot cover applications of data structures to manage non-standard data such as multi-dimensional data, spatial and temporal data, multimedia data, or XML.

Before we begin our treatment of how data structures are used in a DBMS, we brieﬂy review the basic architecture, its components, and their functionality. Unless otherwise noted, our discussion applies to a class of DBMSs that are based on the relational data model. These so-called relational database management systems make up the majority of systems in use today and are oﬀered by all major vendors including IBM, Microsoft, Oracle, and Sybase. Most of the components described here can also be found in DBMSs based on other models such as the object-based model or XML.

Figure 60.1 depicts a conceptual overview of the main components that make up a DBMS. Rectangles represent system components, double-sided arrows represent input and output,

and the solid connectors indicate data as well as process ﬂow between two components.

Please note that the inner workings of a DBMS are quite complex and we are not attempting to provide a detailed discussion of its implementation. For an in-depth treatment the reader should refer to one of the many excellent database textbooks books, e.g., [3–8].

Starting from the top, users interact with the DBMS via commands generated from a variety of user interfaces or application programs. These commands can either retrieve or update the data that is managed by the DBMS or create or update the underlying metadata that describes the schema of the data. The former are called queries, the latter are called data deﬁnition statements. Both types of commands are processed by the Query Evaluation Engine which contains components for parsing the input, producing an execution plan, and executing the plan against the underlying database. In the case of queries, the parsed command is presented to a query optimizer component, which uses information about how the data is stored to produce an eﬃcient execution plan from the possibly many alternatives. We discuss data structures that represent parsed queries, execution plans, and statistics about a database, including the data structures that are used by an external sorting algorithm in Section 60.2 when we focus on the query evaluation engine.

Since databases are normally too large to ﬁt into the main memory of a computer, the data of a database resides in secondary memory, generally on one or more magnetic disks. However, to execute queries or modiﬁcations on data, that data must ﬁrst be transferred to main memory for processing and then back to disk for persistent storage. It is the job of the Storage Subsystem to accomplish a sophisticated placement of data on disk, to assure an eﬃcient localization of these persistent data, to enable their bidirectional transfer between disk and main memory, and to allow direct access to these data from other DBMS subsystems. The storage subsystem consists of two components: The Disk Space Manager is responsible for storing physical data items on disk, managing free regions of the disk space, hiding device properties from higher architecture levels, mapping physical blocks to tracks and sectors of a disc, and controlling the transfer of data items between external and main memory. The Buﬀer Manager organizes an assigned, limited main memory area called buﬀer and may comprise several smaller buﬀers (buﬀer pool). Other subsystems may have direct access to data items in these buﬀers.

In Sections 60.3 and 60.4, we discuss data structures that are used to represent both data in memory as well as on disk such as ﬁxed and variable-length records, large binary objects (LOBs), heap, sorted, and clustered ﬁles, as well as diﬀerent types of index structures. Given the fact that a database management system must manage data that is both resident in main memory as well as on disk, one has to deal with the reality that the most appropriate data structure for data stored on disk is diﬀerent from the data structures used for algorithms that run in main memory. Thus when implementing the storage manager, one has to pay careful attention to select not only the appropriate data structures but also to map the data between them in an eﬃcient manner.

In addition to the above two subsystems, today’s modern DBMSs include a Transaction Management Subsystem to support concurrent execution of queries against the database and recovery from failure. Although transaction processing is an important and complex topic, it is less interesting for our investigation of data structures and is mentioned here only for completeness.

The rest of this chapter is organized as follows. Section 60.2 describes important data structures used during query evaluation.

Data structures used for buﬀer management are described in Section 60.3, and data structures used by the disk space manager are described in Section 60.4. Section 60.5 concludes the chapter.

Search This Blog

algorithms

Data Structures for Databases:Overview of the Functionality of a Database Management System.

Overview of the Functionality of a Database Management System

Comments

Post a Comment

Popular posts from this blog

0/1 Knapsack Problem Memory function.

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Drawing Trees:HV-Layout