Data Structures for Databases:Data Structures for Buﬀer Management

Data Structures for Buﬀer Management

A buﬀer is partitioned into an array of frames each of which can keep a page. Usually a page of a buﬀer is mapped to a block ∗ of a ﬁle so that reading and writing of a page only require one disk access each. Application programs and queries make requests on the buﬀer manager when they need a block from disk, that contains a data item of interest. If the A block is a contiguous sequence of bytes and represents the unit used for both storage allocation and data transfer. It is usually a multiple of 512 Bytes and has a typical size of 1KB to 8KB. It may contain several data items. Usually, a data item does not span two or more blocks.

block is already in the buﬀer, the buﬀer manager conveys the address of the block in main memory to the requester. If the block is not in main memory, the buﬀer manager ﬁrst allocates space in the buﬀer for the block, throwing out some other block if necessary, to make space for the new block. The displaced block is written back to disk if it was modiﬁed since the most recent time that it was written to the disk. Then, the buﬀer manager reads in the requested block from disk into the free frame of the buﬀer and passes the page address in main memory to the requester. A major goal of buﬀer management is to minimize the number of block transfers between the disk and the buﬀer.

Besides pages, so-called segments are provided as a counterpart of ﬁles in main memory. This allows one to deﬁne diﬀerent segment types with additional attributes, which support varying requirements concerning data processing. A segment is organized as a contiguous subarea of the buﬀer in a virtual, linear address space with visible page borders. Thus, it consists of an ordered sequence of pages. Data items are managed so that page borders are respected. If a data item is required, the address of the page in the buﬀer containing the item is returned.

An important question now is how segments are mapped to ﬁles. An appropriate mapping enables the storage system to preserve the merits of the ﬁle concept. The distribution of a segment over several ﬁles turns out to be unfavorable in the same way as the representation of a data item over several pages. Hence, a segment Sk is assigned to exactly one ﬁle Fj , and m segments can be stored in a ﬁle. Since block size and page size are the same, each page Pki ∈ Sk is assigned to a block Bjl ∈ Fj . We distinguish four methods of realizing this mapping.

The direct page addressing assumes an implicitly given mapping between the pages of a segment Sk and the blocks of a ﬁle Fj . The page Pki (1 ≤ i ≤ sk) is stored in the block Bjl (1 ≤ l ≤ dj ) so that l = Kj − 1 + i and dj ≥ Kj − 1 + sk holds. Kj denotes the number of the ﬁrst block reserved for Sk (Figure 60.8).

Frequently, we have a restriction to a 1:1-mapping, i.e., Kj = 1 and sk = dj hold. Only in this case, a dynamic extension of segments is possible. A drawback is that at the time of the segment creation the assigned ﬁle area has to be allocated so that a block is occupied for each empty page. For segments whose data stock grows slowly, the ﬁxed block allocation leads to a low storage utilization.

The indirect page addressing oﬀers a much larger ﬂexibility for the allocation of pages to blocks and, in addition, dynamic update and extension functionality (Figure 60.9). It requires two auxiliary data structures.

• Each segment Sk is associated with a page table Tk which for each page of the segment contains an entry indicating the block currently assigned to the page. Empty pages obtain a special null value in the page table.

• Each ﬁle Fj is associated with a bit table Mj which serves for free disk space management and quotes for each block whether currently a page is mapped to it or not. Mj (l) = 1 means that block Bjl is occupied; Mj (l) = 0 says that block Bjl is free. Hence, the bit table enables a dynamic assignment between pages and blocks.

Although this concept leads to an improved storage utilization, for large segments and ﬁles, the page tables and bit tables have to be split because of their size, transferred into main memory and managed in a special buﬀer. The provision of a page Pki that is not in the buﬀer can require two physical block accesses (and two enforced page removals), because, if necessary, the page table Tk has to be loaded ﬁrst in order to ﬁnd the current block address j = Tk(i).

The two methods described so far assume that a modiﬁed page is written back to the block that has once been assigned to it (update in place). If an error occurs within a transaction, as a result of the direct placement of updates, the recovery manager must provide enough log information (undo information) to be able to restore the old state of a page. Since the writing of large volumes of log data leads to notable eﬀort, it is often beneﬁcial to perform updates in a page in a manner so that the old state of the page is available until the end of the transaction. The following two methods are based on an indirect update of changes and provide extensive support for recovery.

The twin slot method can be regarded as a modiﬁcation of the direct page addressing. It causes very low costs for recovery but partially compensates this advantage through double disk space utilization. For a page Pki of a segment Sk , two physically consecutive blocks Bjl 1 and Bjl of a ﬁle Fj with l = Kj − 1+2 · i are allocated. Alternately, at the beginning of a transaction, one of both block keeps the current state of the page whereas changes are written to the other block. In case of a page request, both blocks are read, and the block with the more recent state is provided as the current page in the buﬀer. The block with the older state then stores the changed page. By means of page locks, a transaction-oriented recovery concept can be realized without explicitly managing log data.

The shadow paging concept (Figure 60.10) represents an extension of the indirect page addressing method and also supports indirect updates of changes. Before the beginning of a new save-interval given by two save-points † the contents of all current pages of a segment are duplicated as so-called shadow pages and can thus be kept unchanged and consistent. This means, when a new save-point is created, all data structures belonging to the representation of a segment Sk (i.e., all occupied pages, the page table Tk, the bit table M ) are stored as a consistent snapshot of the segment on disk. All modiﬁcations during a save-interval are performed on copies T I and M t of T and M . Changed pages are not written back to their original but to free blocks. At the creation of a new save-point, which must be an atomic operation, the tables T I and M t as well as all pages that belong to this state and have been changed are written back to disk. Further, all those blocks are released whose pages were subject to changes during the last save-interval. This just concerns those shadow pages for which a more recent version exists. At the beginning of the next save-interval the current contents of T I and M

As an example, Figure 60.10 shows several changes of pages in two segments S1 and S2. These changes are marked by so-called shadow bits in the page tables. Shadow bits are employed for the release of shadow pages at the creation time of new save-points. If a segment consists of s pages, the pertaining ﬁle must allocate s further blocks, because each changed page occupies two blocks within a save-interval.

The save-points orientate themselves to segments and not to transaction borders. Hence, in an error case, a segment-oriented recovery is executed. For a transaction-oriented recovery additional log data have to be collected.

Search This Blog

algorithms

Data Structures for Databases:Data Structures for Buﬀer Management

Data Structures for Buﬀer Management

Comments

Post a Comment

Popular posts from this blog

0/1 Knapsack Problem Memory function.

Binary Space Partitioning Trees:BSP Tree as a Hierarchy of Regions.

Warshall’s Algorithm -to find TRANSITIVE CLOSURE