Database Storage Organization

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which storage type provides the fastest data access for the CPU?

  • Primary storage (correct)
  • Secondary storage
  • Tertiary storage
  • External storage

Which of the following is characteristic of secondary storage?

  • Slower data access compared to primary storage (correct)
  • Directly accessed by the CPU
  • More expensive per unit of storage compared to tertiary storage
  • Smaller storage capacity compared to primary storage

What is the primary role of 'cache memory' in the memory hierarchy?

  • Serving as the least expensive end of the memory hierarchy
  • Providing the main work area for the CPU
  • Storing large permanent databases
  • Speeding up execution of program instructions (correct)

Which memory type is known as 'main memory'?

<p>DRAM (D)</p> Signup and view all the answers

What does a 'record type' or 'record format definition' consist of?

<p>A collection of field names and their corresponding data types (D)</p> Signup and view all the answers

What is the key characteristic of 'fixed-length records'?

<p>Every record in the file has exactly the same size in bytes (D)</p> Signup and view all the answers

When might a file have variable-length records?

<p>When one or more fields have multiple values for individual records (C)</p> Signup and view all the answers

In fixed-length records, if a field is optional, what is a common technique to handle the absence of a value?

<p>Storing a special NULL value (A)</p> Signup and view all the answers

What is the purpose of 'separator characters' in variable-length fields?

<p>To determine the bytes that represent each field (D)</p> Signup and view all the answers

If a file includes records of different types, what is typically included at the beginning of each record?

<p>A record type indicator (B)</p> Signup and view all the answers

What is 'blocking factor'?

<p>The number of records that can fit into one block (D)</p> Signup and view all the answers

In the context of database storage, what is a 'spanned record'?

<p>A record that is larger than the block size. (C)</p> Signup and view all the answers

When is it advantageous to use spanned records?

<p>When the average record is large (D)</p> Signup and view all the answers

In 'contiguous allocation', what is a disadvantage?

<p>Makes expanding file blocks difficult (C)</p> Signup and view all the answers

What is the role of a 'file header'?

<p>Determine the disk addresses of the file blocks. (A)</p> Signup and view all the answers

What is a primary drawback of using files of unordered records (heap files)?

<p>Expensive linear search procedure (C)</p> Signup and view all the answers

What is a common method for record deletion in unordered files (heap files) that avoids immediately rewriting the block?

<p>Setting a deletion marker (C)</p> Signup and view all the answers

In which file organization if the 'ordering field' also a 'key field'?

<p>Sorted files (C)</p> Signup and view all the answers

What is the primary disadvantage of using ordered files (sorted Files)?

<p>Expensive insertions and deletions (B)</p> Signup and view all the answers

What is a typical approach to handle insertions in ordered files to improve efficiency?

<p>Creating an overflow file (D)</p> Signup and view all the answers

In hashing, what is the component that generates a disk block address?

<p>Hash function (B)</p> Signup and view all the answers

What condition must be met in a hash file?

<p>An equality condition (A)</p> Signup and view all the answers

In hashing, what is 'folding'?

<p>Using an arithmetic or logical function to combine portions of the hash field value (A)</p> Signup and view all the answers

What happens during a 'collision' in hashing?

<p>Two different records hash to the same address (C)</p> Signup and view all the answers

In 'open addressing', how is a collision resolved?

<p>By checking subsequent positions until an empty one is found (B)</p> Signup and view all the answers

How does the 'chaining' method resolve collisions in hashing?

<p>It allocates additional overflow positions and uses pointers. (D)</p> Signup and view all the answers

What is the main goal of a good hashing function?

<p>To minimize collisions and unused locations (C)</p> Signup and view all the answers

What is the key difference between internal and external hashing?

<p>External hashing is for disk files, while internal hashing is for memory. (D)</p> Signup and view all the answers

In external hashing, what does a bucket typically consist of?

<p>One disk block or a cluster of contiguous disk blocks (A)</p> Signup and view all the answers

In external hashing, what is 'static hashing'?

<p>A hashing scheme where the number of buckets is fixed. (B)</p> Signup and view all the answers

Which of the following describes 'extendible hashing'?

<p>A dynamic hashing that grows and shrinks efficiently. (D)</p> Signup and view all the answers

What do index structures provide, in addition to the primary data file?

<p>Alternative ways to access the records. (C)</p> Signup and view all the answers

What is a primary index?

<p>A index on the ordering key field of an ordered file (D)</p> Signup and view all the answers

In a file using a primary index, what is referred to as the 'block anchor'?

<p>First record in each block. (A)</p> Signup and view all the answers

What characterizes a 'dense index'?

<p>It has an index entry for every search key value in the data file. (B)</p> Signup and view all the answers

What is the difference between a primary index and a clustering index?

<p>A clustering has duplicate values (D)</p> Signup and view all the answers

What is a key advantage of using a secondary index?

<p>Improvement in search time for an arbitrary record (C)</p> Signup and view all the answers

On what type of field can a secondary index be created?

<p>It can be created on a candidate field with unique value or on a nonkey field with duplicate values. (C)</p> Signup and view all the answers

What is the purpose of multilevel indexing?

<p>Faster access, especially for large directories (D)</p> Signup and view all the answers

What are the advantages of B-Trees?

<p>Self-balancing, fast retrieval times (D)</p> Signup and view all the answers

In a B+ tree, where are data pointers stored?

<p>Only at the leaf nodes (C)</p> Signup and view all the answers

Imagine a scenario where a database system uses spanned records with a block size $B$ of 4096 bytes. If the file contains a mix of variable-length and fixed-length records, and a particular variable-length record with internal fragmentation consumes significant portions block, without separators, and the number of records, $r = 10,000$, with an average file size of 2000 bytes, what would be the best way to approach optimization?

<p>Implement a record clustering strategy to place related records contiguously, improving locality and reducing block boundary crossings. (D)</p> Signup and view all the answers

In cases where memory read and write operations are significantly slower than CPU operations, which strategies would be the most effective in minimizing the impact of slow memory access in a database system?

<p>Increase memory capacity and reduce frequency of memory read/write operations. (B)</p> Signup and view all the answers

Suppose in a very specific database file format, each record starts with a 2-byte record type indicator, and the rest of the record's structure varies greatly based on this indicator. Given this setup, devise a precise algorithm to perform a binary search on this file by the timestamp field, considering the timestamp field may be located at vastly different offsets into each record. If not, where did the file read go wrong? The timestamp is guaranteed to exist but its location is unknown, and is not guaranteed, either. Return 'record not found'.

<p>Use binary search algorithm. Perform linear record read. Use a key field to organize records to perform search. Maintain timecode and key fields that follow a common standard with a block size divisible by the size of the new record-header. (C)</p> Signup and view all the answers

Signup and view all the answers

Flashcards

Primary storage

Storage media that can be operated on directly by the computer's CPU. Provides fast data access.

Secondary and tertiary storage

Storage including magnetic disks, optical disks, and tapes; data must be copied to primary storage to be processed by the CPU.

Record

Collection of related data values, corresponds to a particular field.

Attributes

The qualities that describe entities.

Signup and view all the flashcards

Record type

Field names and their data types.

Signup and view all the flashcards

Data type

Specifies the type of value a field can take.

Signup and view all the flashcards

Fixed-length records

Each record in the file has the same size.

Signup and view all the flashcards

Variable-length records

Records in the file have different sizes.

Signup and view all the flashcards

File

A sequence of records.

Signup and view all the flashcards

Variable-length fields

Records are the same type, but some fields have varying size.

Signup and view all the flashcards

Repeating field

Fields may have multiple values for individual records.

Signup and view all the flashcards

Optional fields

Fields may have values only for some records.

Signup and view all the flashcards

Block

The unit of data transfer between disk and memory.

Signup and view all the flashcards

Blocking factor

The number of records that can fit into one block.

Signup and view all the flashcards

Spanned records

Records can span more than one block.

Signup and view all the flashcards

Unspanned records

Records do not cross block boundaries.

Signup and view all the flashcards

Contiguous allocation

Blocks allocated to consecutive disk blocks.

Signup and view all the flashcards

Linked allocation

Each block contains a pointer to the next block.

Signup and view all the flashcards

Indexed allocation

One or more blocks contain pointers to actual file blocks.

Signup and view all the flashcards

File header

Contains info about the file; disk addresses and record format.

Signup and view all the flashcards

Heap file

Records are placed in the file in the order they are inserted.

Signup and view all the flashcards

Deletion marker

Extra byte to mark a record as deleted.

Signup and view all the flashcards

Ordered file

Records are ordered based on the values of one of their fields.

Signup and view all the flashcards

Ordering key

A field that has a unique value in each record.

Signup and view all the flashcards

Overflow file

Temporary unordered file for new records.

Signup and view all the flashcards

Hashing

Provides fast access to records under certain search conditions.

Signup and view all the flashcards

Hash field

Field used in equality condition; yields the address of disk block in which record is stored.

Signup and view all the flashcards

Hash function

Transforms the hash field value into an integer.

Signup and view all the flashcards

Folding

Applying an arithmetic or logical function to parts of a hash value.

Signup and view all the flashcards

Digit picking

Choosing digits to form the hash address.

Signup and view all the flashcards

Collision

Hashing to an occupied address.

Signup and view all the flashcards

Open addressing

Finding an empty position in the array.

Signup and view all the flashcards

Chaining

Extending the array with overflow locations.

Signup and view all the flashcards

Bucket

Table converted into disk block.

Signup and view all the flashcards

Indexing

Uses secondary access paths.

Signup and view all the flashcards

Primary index

Specified on the ordering key field of an ordered file.

Signup and view all the flashcards

Clustering index

Uses a nonkey field, where the file is physically ordered.

Signup and view all the flashcards

Secondary index

Specified on any nonordering field of a file.

Signup and view all the flashcards

Dense Index

An index entry for every search key value in data file.

Signup and view all the flashcards

Sparse Index

An index entry for some of the search values only.

Signup and view all the flashcards

Study Notes

Database Storage Organization and Storage

  • Databases are physically stored as files of records, typically on magnetic disks.
  • This module covers organizing databases in storage and accessing them efficiently with algorithms, including those that use indexes.
  • Data for a database must be physically stored on a computer storage medium forming a storage hierarchy.

Storage Hierarchy Categories

  • Primary storage includes media directly operated on by the CPU, like main memory and cache memory.
  • Primary storage offers fast data access but has limited storage capacity.
  • Secondary and tertiary storage include magnetic disks, optical disks (CD-ROMs, DVDs), and tapes.
  • Hard-disk drives are classified as secondary storage.
  • Removable media like optical disks and tapes are considered tertiary storage.
  • Secondary and tertiary storage devices have larger capacities and lower costs, but they offer slower data access compared to primary storage.
  • Data in secondary or tertiary storage must be copied into primary storage before the CPU can process it.

Memory Hierarchy Levels

  • At the primary storage level, the most expensive end includes cache memory, which is static RAM.
  • Cache memory is used by the CPU to speed up program instruction execution through prefetching and pipelining.
  • The next level of primary storage is DRAM (Dynamic RAM), known as main memory, providing the CPU's work area for program instructions and data.
  • At the secondary and tertiary storage level, the hierarchy includes magnetic disks, mass storage in the form of CD-ROM devices, and tapes which are the least expensive end.
  • Large permanent databases generally reside on secondary storage (magnetic disks).
  • Portions of these databases are read into and written from buffers in main memory as needed.

Records and Record Types

  • Data is stored in records, where each record comprises related data values or items.
  • Each value is formed of one or more bytes and corresponds to a particular field of the record.
  • Records describe entities and their attributes.
  • For example, an EMPLOYEE record signifies an employee entity, with each field value specifying an employee attribute like Name, Birth_date, Salary, or Supervisor.
  • A collection of field names and their corresponding data types constitutes a record type or record format definition.
  • A data type, associated with each field, specifies the types of values a field can hold.
  • A field's data type is usually one of the standard data types used in programming.
  • The bytes required for each data type are fixed for a given computer system.

Common Data Types and Sizes

  • Integer: 4 bytes
  • Long integer: 8 bytes
  • Real number/floating point: 4 bytes
  • Boolean: 1 byte
  • Date (assuming YYYY-MM-DD): 10 bytes
  • Fixed-length string: k bytes for k characters
  • Variable-length strings: May require as many bytes as there are characters in each field value

Fixed-Length vs. Variable-Length Records

  • A file comprises a sequence of records, often of the same record type.
  • If every record has the same size (in bytes), the file consists of fixed-length records.
  • When records in a file have different sizes, that file is considered to consist of variable-length records.

Reasons for Variable-Length Records

  • Records are of the same type, but one or more fields vary in size (variable-length fields). For instance, the Name field of an EMPLOYEE record.
  • Records are of the same type, but one or more fields may have multiple values (repeating fields), often called a repeating group.
  • Records share the same type, but fields are optional, meaning some may not have values (optional fields).
  • The file contains mixed record types, leading to varying record sizes, common when related records are clustered on disk blocks (e.g., GRADE_REPORT records following a STUDENT's record).

Considerations for Fixed-Length Records

  • The fixed-length EMPLOYEE records have a record size of 71 bytes.
  • Each record has the same fields and fixed field lengths, allowing the system to identify the starting byte position of each field relative to the record's starting position.

Handling Optional Fields

  • For optional fields, all records include every field, but a special NULL value is stored when no value exists.
  • For repeating fields, allocate spaces for the maximum possible occurrences of the field, which can waste space.

Handling Variable-Length Fields

  • To determine each field's bytes within a record for variable-length fields, special separator characters (e.g., ?, %, $) can be used to terminate fields.

Formatting Data in Files with Optional Fields

  • Optional fields can be formatted differently in files.
  • For a large record type with a small number of fields appearing in a typical record, each record can include a sequence of <field-name, field-value> pairs instead of plain field values.
  • A short field type code (e.g., an integer) can be assigned to each field, including a sequence of <field-type, field-value> pairs instead of <field-name, field-value> pairs for practicality.

Separating Repeating Fields

  • Repeating fields need field separators (characters) for items of the field and another for termination of the field.

Indicating Different Record Types

  • For files with different record types, each record is preceded by a record type indicator.

Record Allocation to Disk Blocks

  • File records must be allocated to disk blocks, as a block is the unit of data transfer.
  • If the block size is greater than the record size, each block contains numerous records, despite potential large records exceeding block size.
  • For fixed-length records of size R bytes and a block size of B bytes, with B ≥ R, bfr = ⎣B/R⎦ records can fit per block. ⎣(x)⎦ (floor function) rounds down x to an integer.
  • The value of "bfr" is called the blocking factor for the file.
  • If R does not evenly divide B, the unused space in each block is B - (bfr * R) bytes.

Spanned vs. Unspanned Records

  • To utilize unused space, one block can store part of a record with the rest on another block.
  • A pointer at the prior block’s end points to another block holding the remainder of a disk if it isn't consecutive.
  • "Spanned" organization permits records to span across blocks, required when records exceed a block's size.
  • "Unspanned" organization prohibits records from crossing block boundaries, used with fixed-length records where block size exceeds record size (B > R).
  • Unspanned storage simplifies record processing by ensuring each record starts at a known location within the block.

Variable Length Records Organization

  • For variable-length records, either spanned or unspanned organizations are viable.
  • Spanning is advantageous for large average records, reducing lost space in each block.
  • With spanned organization, each block may store a different number of variable-length records.
  • The blocking factor (bfr) signifies a file's average number of records per block.
  • To calculate a number of blocks (b) required for 'r' records; b = ⎡(r/bfr)⎤ is the number of blocks, ⎡(x)⎤ (ceiling function) rounds up the value x to the next integer

Allocating File Blocks on Disk

  • Multiple standard techniques are available for allocating file blocks on disk.
  • Contiguous allocation assigns file blocks to consecutive disk blocks. Double buffering results in fast reading when allocating, but it complicates file expanding.
  • Linked allocation uses a pointer in each file block for the next file block, making expanding the file easy, but causes slower whole file reading.
  • Allocation combines contiguous disk block clusters that are linked together. These clusters are also called file segments or extents.
  • Index allocation enables index blocks with pointers to actual file blocks, and also enables common technique combinations.

Contents of File Headers

  • A file header or file descriptor includes file information needed by programs accessing records.
  • The header includes file block locations, format descriptions, field lengths, field order (for fixed-length unspanned files), field type codes, & separators (for variable-length files).
  • Programs copy blocks into main memory for searching and use the file header for the same.
  • If the address of the block containing a desired record is unknown program must do a linear search of each block to find the record.
  • A good file organization locates a block containing a desired record with a minimum block transfer.

Heap Files Explained

  • Heap or pile files place the records in filing order where records added at the end of file.
  • Adding new records is efficient because it has the last disk block copying into buffer, record addition, and disk rewriting. The file header has the address.
  • Searching for a record through any criteria involves a linear search and is an expensive procedure.
  • A program typically reads into memory and searches half the blocks until it finds the record. This search calls for (b/2) searches (on average) in a 'b' amount of blocks.
  • If the search doesn't yield a record or there are multiple records satisfying the search, the program reads all 'b' blocks and searches it.

Deleting Records in Heap Files

  • Program copies a block, deletes records, and rewrites disk block to start deleting a record.
  • Deleting numerous records results in more space and is wasteful.
  • A system includes adding extra byte/bit called a deletion marker and storing with each record. A valid/invalid value is set for each record.
  • Search programs search only valid records in a block. Periodical reorganization calls for blocks accessed consecutively and packing of records.
  • Space use of deleting records in new records calls for extra bookkeeping to keep track of empty storage.

Spanned vs. Unspanned in Heap Files

  • Either spanned or unspanned organization or either fixed/variable-length records can get used in unordered file.
  • Variable-length record mods require deleting/inserting modified record since said modified record may not fit in its old storage on disk.
  • Reordering records on a field makes a copy of such record. Sorting is hard, and external sorting calls for special tactics.

Ordered (Sorted) Files

  • Records ordered physically on disk (on sorting field). Sorting creates unique or sequential file.
  • Fields with unique value result in sorting key of current file.
  • Sorting files are more optimal than unordered file records since it first, results in efficient sorting and second, the next value calls for an additional block since next block has current block.
  • Sorting files block and store over cylinders while minimizing seek time.

Limitations of Ordered Files

  • Sorting does not sort random values on other un-ordered fields.
  • Putting files calls for expensive moves and files.
  • To add records, one has to find a position and file to do it.
  • For a huge file, average operations are consuming and has to move half the file for each move.
  • Issue decreases the amount of use if deletion markers are in place.

Enhancing Insertion Efficiency in Ordered Files

  • One option contains unused blocks for new records though, it becomes useless over time.
  • Second hand, temporary file is called a temporary or transaction file while calling master or main.
  • New files insert at the last overflow file which is less important than the main. Period overflow calls for sorted and main merge.
  • Overflow calls for linear if a search does not yield in main.

Considerations for Modifying Ordered Files

  • Edit depends on find and file edit type.
  • Given search is ordered, locate using sorting which it does not, start linear searching.
  • Edit none call rewrite and fixed records on disk
  • Edit the sorting name results to remove/insert the name.
  • Sorting files are rarely used unless access path like the primary index. Secondary results and files get included due to sorting/ secondary files.

Introduction to Hashing Techniques

  • Hashing provides fast approach by search criteria and file organization.
  • It requires search of common equal to key.
  • The function h or hash must find addresses.
  • Search for records on block.
  • Record must search by block.

Applying Hashing to Files

  • Files must record to other files. Internal files use table and use a matrix.
  • The matrix has a range of 0-M index.
  • Transform to get value of key from M - 1. Apply h(K)= K to find record.

Dealing with Non-Integer Hash Field Values

  • Non-integer hash will transform until mod applies.
  • Numeric code is applied to multiplication.
  • Algorithm uses codes in array.

Folding Hash Function

  • Folding applies addition/ functions or excludes for calculations using function, key storage.

Digit Extraction

  • Another strategy calls for picking some numbers of storage.

Challenges with Hashing Functions

  • Hashing results may be hashing values not distinct code.
  • Hash Value = 172. The result must call or have another record.
  • Other result is a solution for resolution.
      • Open addressing yields empty use or call for it.

Resolving Collisions with Chaining

  • Chaining results in extends of other overflow. Also, each position to other fields calls for placements.

Resolving Collisions with Multiple Hashing

  • Multiple hashing applies hash calls by results.
  • A solid hash results best to minimize record call while having multiple storage sites.

External Hashing Techniques

  • Hash has disk external.
  • List characteristics of buckets call for multi space as result in a cluster disk bucket.
  • Relate has key to number.
  • File storage converts to block from bucket.

Techniques to Handle the Collision Problem

  • The collision is less when buckets have the ability to store it while the overflow creates problems.
  • New key stores the linked list or record in list.
  • Pointer or list has block address and relative position in current block.

Limits of Static Hashing

  • Hashing call access by hash values.
  • Static hashing stops dynamic since key code gets sorted.
  • File results calls for empty space ( M M).
  • File size calls for too big( M M)
  • Adjust blocks or get new hash from it.
  • New file allows number for edits.

Using External Hashing Effectively

  • Search key on has to call linear search when external use involved.
  • Record removes for overflow replacement in the new code
  • Clear overflow record.
  • To edit new key on file is hard- locate then call secondary or equal for edits.
  • Files key means change with other code insert.

Hash file expansion tech

  • Limit code means static and hard to copy.
  • Call code storage add to files
  • Line requires code
  • Dynamic lists database for search

Extendible Hashing

  • The code allows dynamic storage with overflow. The result means type calls for index.

Indexing Structures for Files

  • Auxiliary indexing structure for quick record retrieval based on specific searches
  • Index types are separated on a disk that offer alternative access without causing the primary edits on file.
  • Helps record search based on indexing.
  • File has a field type and multiple keys
  • Index structure divide in levels

Ordered Indexes

  • ordered indexes match index on books with names with chapter listings
  • file for records the index has names and a field.
  • the index lists records by the field to that value.
  • the index has an order binary
  • has many forms primary index key, non key, and also cluster and secondary

Primary Indexes Explained

  • Index and file has sort for key type
  • Index key has order and its two fields to allow efficient searching.
  • Add single info record on index block
  • Each index key has its 2 fields on that key.
  • Total code is equal to code on block
  • First record has each file to make block code

Indexes as: Dense or Sparse

  • Types can organize into Dense or Sparse.
  • Each file and key that allows code into file that has its result.
  • Sparse key on file is not equal to others and has space that doesn’t meet requirements.
  • File type for code takes too amount of space.
  • One code is smaller vs big for 2
  • Therefore results and records has too many block searches.

Multilevel Indexes-Insertion/Deletion of Records

  • To locate a record (with binary of log blocks and code result ) requires additional file.
  • Large problems with primary results that call insertion
  • Compounding with other inserts to the results also require editing.
  • Records have mark of editing

Primary Indexes Calculations

  • 3000 by code (B) is its data file and its no span and 100 (R) with 30,000(r). Data has how results.
  • Sort the file. Use pointer = 6 code with result = 9.

Blocking Factor with One

  • Block with size of (102.4/1)/ block = 10
  • The result has that = 30000 and log by result equals 12

Size Code Results

  • R = code sorting equals result . and logs block to get number or block to need (results equal 6 and one results is a 7)

Clustering

  • Blocks need sorting from results/ code key
  • If has block on key and field there a cluster is born.
  • Type has cluster for retrieval result
  • Also with blocks code has file type for that also disk block.
  • The record has blocks and block to first to get that.
  • This block that has distinct values but is no key . Therefore the field is different than the required file.

Index and Cluster Storage

  • Adding/Editing still has problems because block is edited on file.
  • Each value must have an equal value.
  • Hashing is similar with block.
  • List search key uses hash vs block searching .

Block

  • Single file = storage
  • File can have edit flag to has or block
  • has = record on point.
  • The field key block or order by records pointer can have more copies .

File key

  • There can be key record on data list where you can store records for that key record
    • 1 code will allow same key
    • 2 Edit lists where you add code to edit the list.
  • It is also common code =edit storage.

File has code levels

  • Remove block
  • Code to create and copy

Multileveling

  • Binary finds each time and block it
  • Reduce levels. Reduce copy to reduce code of file
  • List calls file to second block = key . Also use insert to copy
  • B-type records B- key with access with file blocks

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser