Full Transcript

Disk Storage, Basic File Structures, Hashing, and Modern Storage Architectures Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe 16.1 Introduction ◼ Databases typically stored on magnetic disks ◼ Accessed using physical database file structures ◼ Storage hierarchy...

Disk Storage, Basic File Structures, Hashing, and Modern Storage Architectures Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe 16.1 Introduction ◼ Databases typically stored on magnetic disks ◼ Accessed using physical database file structures ◼ Storage hierarchy ◼ Primary storage ◼ CPU main memory, cache memory ◼ Secondary storage ◼ Magnetic disks, flash memory, solid-state drives ◼ Tertiary storage ◼ Removable media Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 2 Memory Hierarchies and Storage Devices ◼ Cache memory ◼ Static RAM ◼ DRAM ◼ Mass storage ◼ Magnetic disks ◼ CD-ROM, DVD, tape drives ◼ Flash memory ◼ Nonvolatile Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 3 Storage Types and Characteristics Table 16.1 Types of Storage with Capacity, Access Time, Max Bandwidth (Transfer Speed), and Commodity Cost Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-4 Storage Organization of Databases ◼ Persistent data ◼ Most databases ◼ Transient data ◼ Exists only during program execution ◼ File organization ◼ Determines how records are physically placed on the disk ◼ Determines how records are accessed Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 5 16.2 Secondary Storage Devices ◼ Hard disk drive ◼ Bits (ones and zeros) ◼ Grouped into bytes or characters ◼ Disk capacity measures storage size ◼ Disks may be single or double-sided ◼ Concentric circles called tracks ◼ Tracks divided into blocks or sectors ◼ Disk packs ◼ Cylinder Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 6 Single-Sided Disk and Disk Pack Figure 16.1 (a) A single-sided disk with read/write hardware (b) A disk pack with read/write hardware Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-7 Sectors on a Disk Figure 16.2 Different sector organizations on disk (a) Sectors subtending a fixed angle (b) Sectors maintaining a uniform recording density Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-8 Secondary Storage Devices (cont’d.) ◼ Formatting ◼ Divides tracks into equal-sized disk blocks ◼ Blocks separated by interblock gaps ◼ Data transfer in units of disk blocks ◼ Hardware address supplied to disk I/O hardware ◼ Buffer ◼ Used in read and write operations ◼ Read/write head ◼ Hardware mechanism for read and write operations Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 9 Secondary Storage Devices (cont’d.) ◼ Disk controller ◼ Interfaces disk drive to computer system ◼ Standard interfaces ◼ SCSI ◼ SATA ◼ SAS Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 10 Secondary Storage Devices (cont’d.) ◼ Techniques for efficient data access ◼ Data buffering ◼ Proper organization of data on disk ◼ Reading data ahead of request ◼ Proper scheduling of I/O requests ◼ Use of log disks to temporarily hold writes ◼ Use of SSDs or flash memory for recovery purposes Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 11 Solid State Device Storage ◼ Sometimes called flash storage ◼ Main component: controller ◼ Set of interconnected flash memory cards ◼ No moving parts ◼ Data less likely to be fragmented ◼ More costly than HDDs ◼ DRAM-based SSDs available ◼ Faster access times compared with flash Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 12 Magnetic Tape Storage Devices ◼ Sequential access ◼ Must scan preceding blocks ◼ Tape is mounted and scanned until required block is under read/write head ◼ Important functions ◼ Backup ◼ Archive Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 13 16.3 Buffering of Blocks ◼ Buffering most useful when processes can run concurrently in parallel Figure 16.3 Interleaved concurrency versus parallel execution Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 14 Buffering of Blocks (cont’d.) ◼ Double buffering can be used to read continuous stream of blocks Figure 16.4 Use of two buffers, A and B, for reading from disk Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 15 Buffer Management and Replacement Strategies ◼ Buffer management information ◼ Pin count ◼ Dirty bit ◼ Buffer replacement strategies ◼ Least recently used (LRU) ◼ Clock policy ◼ First-in-first-out (FIFO) Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 16 16.4 Placing File Records on Disk ◼ Record: collection of related data values or items ◼ Values correspond to record field ◼ Data types ◼ Numeric ◼ String ◼ Boolean ◼ Date/time ◼ Binary large objects (BLOBs) ◼ Unstructured objects Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 17 Placing File Records on Disk (cont’d.) ◼ Reasons for variable-length records ◼ One or more fields have variable length ◼ One or more fields are repeating ◼ One or more fields are optional ◼ File contains records of different types Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 18 Record Blocking and Spanned Versus Unspanned Records ◼ File records allocated to disk blocks ◼ Spanned records ◼ Larger than a single block ◼ Pointer at end of first block points to block containing remainder of record ◼ Unspanned ◼ Records not allowed to cross block boundaries Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 19 Record Blocking and Spanned Versus Unspanned Records (cont’d.) ◼ Blocking factor ◼ Average number of records per block for the file Figure 16.6 Types of record organization (a) Unspanned (b) Spanned Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-20 Record Blocking and Spanned Versus Unspanned Records (cont’d.) ◼ Allocating file blocks on disk ◼ Contiguous allocation ◼ Linked allocation ◼ Indexed allocation ◼ File header (file descriptor) ◼ Contains file information needed by system programs ◼ Disk addresses ◼ Format descriptions Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 21 16.5 Operations on Files ◼ Retrieval operations ◼ No change to file data ◼ Update operations ◼ File change by insertion, deletion, or modification ◼ Records selected based on selection condition Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 22 Operations on Files (cont’d.) ◼ Examples of operations for accessing file records ◼ Open ◼ Find ◼ Read ◼ FindNext ◼ Delete ◼ Insert ◼ Close ◼ Scan Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 23 16.6 Files of Unordered Records (Heap Files) ◼ Heap (or pile) file ◼ Records placed in file in order of insertion ◼ Inserting a new record is very efficient ◼ Searching for a record requires linear search ◼ Deletion techniques ◼ Rewrite the block ◼ Use deletion marker Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 24 16.7 Files of Ordered Records (Sorted Files) ◼ Ordered (sequential) file ◼ Records sorted by ordering field ◼ Called ordering key if ordering field is a key field ◼ Advantages ◼ Reading records in order of ordering key value is extremely efficient ◼ Finding next record ◼ Binary search technique Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 25 Access Times for Various File Organizations Table 16.3 Average access times for a file of b blocks under basic file organizations Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-26 16.8 Hashing Techniques ◼ Hash function (randomizing function) ◼ Applied to hash field value of a record ◼ Yields address of the disk block of stored record ◼ Organization called hash file ◼ Search condition is equality condition on the hash field ◼ Hash field typically key field ◼ Hashing also internal search structure ◼ Used when group of records accessed exclusively by one field value Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 27 Hashing Techniques (cont’d.) ◼ Internal hashing ◼ Hash table ◼ Collision ◼ Hash field value for inserted record hashes to address already containing a different record ◼ Collision resolution ◼ Open addressing ◼ Chaining ◼ Multiple hashing Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 28 Hashing Techniques (cont’d.) ◼ External hashing for disk files ◼ Target address space made of buckets ◼ Bucket: one disk block or contiguous blocks ◼ Hashing function maps a key into relative bucket ◼ Table in file header converts bucket number to disk block address ◼ Collision problem less severe with buckets ◼ Static hashing ◼ Fixed number of buckets allocated Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 29 16.12 Summary ◼ Magnetic disks ◼ Accessing a disk block is expensive ◼ Commands for accessing file records ◼ File organizations: unordered, ordered, hashed Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 30

Use Quizgecko on...
Browser
Browser