Podcast
Questions and Answers
Which storage type provides the fastest data access for the CPU?
Which storage type provides the fastest data access for the CPU?
- Primary storage (correct)
- Secondary storage
- Tertiary storage
- External storage
Which of the following is characteristic of secondary storage?
Which of the following is characteristic of secondary storage?
- Slower data access compared to primary storage (correct)
- Directly accessed by the CPU
- More expensive per unit of storage compared to tertiary storage
- Smaller storage capacity compared to primary storage
What is the primary role of 'cache memory' in the memory hierarchy?
What is the primary role of 'cache memory' in the memory hierarchy?
- Serving as the least expensive end of the memory hierarchy
- Providing the main work area for the CPU
- Storing large permanent databases
- Speeding up execution of program instructions (correct)
Which memory type is known as 'main memory'?
Which memory type is known as 'main memory'?
What does a 'record type' or 'record format definition' consist of?
What does a 'record type' or 'record format definition' consist of?
What is the key characteristic of 'fixed-length records'?
What is the key characteristic of 'fixed-length records'?
When might a file have variable-length records?
When might a file have variable-length records?
In fixed-length records, if a field is optional, what is a common technique to handle the absence of a value?
In fixed-length records, if a field is optional, what is a common technique to handle the absence of a value?
What is the purpose of 'separator characters' in variable-length fields?
What is the purpose of 'separator characters' in variable-length fields?
If a file includes records of different types, what is typically included at the beginning of each record?
If a file includes records of different types, what is typically included at the beginning of each record?
What is 'blocking factor'?
What is 'blocking factor'?
In the context of database storage, what is a 'spanned record'?
In the context of database storage, what is a 'spanned record'?
When is it advantageous to use spanned records?
When is it advantageous to use spanned records?
In 'contiguous allocation', what is a disadvantage?
In 'contiguous allocation', what is a disadvantage?
What is the role of a 'file header'?
What is the role of a 'file header'?
What is a primary drawback of using files of unordered records (heap files)?
What is a primary drawback of using files of unordered records (heap files)?
What is a common method for record deletion in unordered files (heap files) that avoids immediately rewriting the block?
What is a common method for record deletion in unordered files (heap files) that avoids immediately rewriting the block?
In which file organization if the 'ordering field' also a 'key field'?
In which file organization if the 'ordering field' also a 'key field'?
What is the primary disadvantage of using ordered files (sorted Files)?
What is the primary disadvantage of using ordered files (sorted Files)?
What is a typical approach to handle insertions in ordered files to improve efficiency?
What is a typical approach to handle insertions in ordered files to improve efficiency?
In hashing, what is the component that generates a disk block address?
In hashing, what is the component that generates a disk block address?
What condition must be met in a hash file?
What condition must be met in a hash file?
In hashing, what is 'folding'?
In hashing, what is 'folding'?
What happens during a 'collision' in hashing?
What happens during a 'collision' in hashing?
In 'open addressing', how is a collision resolved?
In 'open addressing', how is a collision resolved?
How does the 'chaining' method resolve collisions in hashing?
How does the 'chaining' method resolve collisions in hashing?
What is the main goal of a good hashing function?
What is the main goal of a good hashing function?
What is the key difference between internal and external hashing?
What is the key difference between internal and external hashing?
In external hashing, what does a bucket typically consist of?
In external hashing, what does a bucket typically consist of?
In external hashing, what is 'static hashing'?
In external hashing, what is 'static hashing'?
Which of the following describes 'extendible hashing'?
Which of the following describes 'extendible hashing'?
What do index structures provide, in addition to the primary data file?
What do index structures provide, in addition to the primary data file?
What is a primary index?
What is a primary index?
In a file using a primary index, what is referred to as the 'block anchor'?
In a file using a primary index, what is referred to as the 'block anchor'?
What characterizes a 'dense index'?
What characterizes a 'dense index'?
What is the difference between a primary index and a clustering index?
What is the difference between a primary index and a clustering index?
What is a key advantage of using a secondary index?
What is a key advantage of using a secondary index?
On what type of field can a secondary index be created?
On what type of field can a secondary index be created?
What is the purpose of multilevel indexing?
What is the purpose of multilevel indexing?
What are the advantages of B-Trees?
What are the advantages of B-Trees?
In a B+ tree, where are data pointers stored?
In a B+ tree, where are data pointers stored?
Imagine a scenario where a database system uses spanned records with a block size $B$ of 4096 bytes. If the file contains a mix of variable-length and fixed-length records, and a particular variable-length record with internal fragmentation consumes significant portions block, without separators, and the number of records, $r = 10,000$, with an average file size of 2000 bytes, what would be the best way to approach optimization?
Imagine a scenario where a database system uses spanned records with a block size $B$ of 4096 bytes. If the file contains a mix of variable-length and fixed-length records, and a particular variable-length record with internal fragmentation consumes significant portions block, without separators, and the number of records, $r = 10,000$, with an average file size of 2000 bytes, what would be the best way to approach optimization?
In cases where memory read and write operations are significantly slower than CPU operations, which strategies would be the most effective in minimizing the impact of slow memory access in a database system?
In cases where memory read and write operations are significantly slower than CPU operations, which strategies would be the most effective in minimizing the impact of slow memory access in a database system?
Suppose in a very specific database file format, each record starts with a 2-byte record type indicator, and the rest of the record's structure varies greatly based on this indicator. Given this setup, devise a precise algorithm to perform a binary search on this file by the timestamp
field, considering the timestamp
field may be located at vastly different offsets into each record. If not, where did the file read go wrong? The timestamp is guaranteed to exist but its location is unknown, and is not guaranteed, either. Return 'record not found'.
Suppose in a very specific database file format, each record starts with a 2-byte record type indicator, and the rest of the record's structure varies greatly based on this indicator. Given this setup, devise a precise algorithm to perform a binary search on this file by the timestamp
field, considering the timestamp
field may be located at vastly different offsets into each record. If not, where did the file read go wrong? The timestamp is guaranteed to exist but its location is unknown, and is not guaranteed, either. Return 'record not found'.
Flashcards
Primary storage
Primary storage
Storage media that can be operated on directly by the computer's CPU. Provides fast data access.
Secondary and tertiary storage
Secondary and tertiary storage
Storage including magnetic disks, optical disks, and tapes; data must be copied to primary storage to be processed by the CPU.
Record
Record
Collection of related data values, corresponds to a particular field.
Attributes
Attributes
Signup and view all the flashcards
Record type
Record type
Signup and view all the flashcards
Data type
Data type
Signup and view all the flashcards
Fixed-length records
Fixed-length records
Signup and view all the flashcards
Variable-length records
Variable-length records
Signup and view all the flashcards
File
File
Signup and view all the flashcards
Variable-length fields
Variable-length fields
Signup and view all the flashcards
Repeating field
Repeating field
Signup and view all the flashcards
Optional fields
Optional fields
Signup and view all the flashcards
Block
Block
Signup and view all the flashcards
Blocking factor
Blocking factor
Signup and view all the flashcards
Spanned records
Spanned records
Signup and view all the flashcards
Unspanned records
Unspanned records
Signup and view all the flashcards
Contiguous allocation
Contiguous allocation
Signup and view all the flashcards
Linked allocation
Linked allocation
Signup and view all the flashcards
Indexed allocation
Indexed allocation
Signup and view all the flashcards
File header
File header
Signup and view all the flashcards
Heap file
Heap file
Signup and view all the flashcards
Deletion marker
Deletion marker
Signup and view all the flashcards
Ordered file
Ordered file
Signup and view all the flashcards
Ordering key
Ordering key
Signup and view all the flashcards
Overflow file
Overflow file
Signup and view all the flashcards
Hashing
Hashing
Signup and view all the flashcards
Hash field
Hash field
Signup and view all the flashcards
Hash function
Hash function
Signup and view all the flashcards
Folding
Folding
Signup and view all the flashcards
Digit picking
Digit picking
Signup and view all the flashcards
Collision
Collision
Signup and view all the flashcards
Open addressing
Open addressing
Signup and view all the flashcards
Chaining
Chaining
Signup and view all the flashcards
Bucket
Bucket
Signup and view all the flashcards
Indexing
Indexing
Signup and view all the flashcards
Primary index
Primary index
Signup and view all the flashcards
Clustering index
Clustering index
Signup and view all the flashcards
Secondary index
Secondary index
Signup and view all the flashcards
Dense Index
Dense Index
Signup and view all the flashcards
Sparse Index
Sparse Index
Signup and view all the flashcards
Study Notes
Database Storage Organization and Storage
- Databases are physically stored as files of records, typically on magnetic disks.
- This module covers organizing databases in storage and accessing them efficiently with algorithms, including those that use indexes.
- Data for a database must be physically stored on a computer storage medium forming a storage hierarchy.
Storage Hierarchy Categories
- Primary storage includes media directly operated on by the CPU, like main memory and cache memory.
- Primary storage offers fast data access but has limited storage capacity.
- Secondary and tertiary storage include magnetic disks, optical disks (CD-ROMs, DVDs), and tapes.
- Hard-disk drives are classified as secondary storage.
- Removable media like optical disks and tapes are considered tertiary storage.
- Secondary and tertiary storage devices have larger capacities and lower costs, but they offer slower data access compared to primary storage.
- Data in secondary or tertiary storage must be copied into primary storage before the CPU can process it.
Memory Hierarchy Levels
- At the primary storage level, the most expensive end includes cache memory, which is static RAM.
- Cache memory is used by the CPU to speed up program instruction execution through prefetching and pipelining.
- The next level of primary storage is DRAM (Dynamic RAM), known as main memory, providing the CPU's work area for program instructions and data.
- At the secondary and tertiary storage level, the hierarchy includes magnetic disks, mass storage in the form of CD-ROM devices, and tapes which are the least expensive end.
- Large permanent databases generally reside on secondary storage (magnetic disks).
- Portions of these databases are read into and written from buffers in main memory as needed.
Records and Record Types
- Data is stored in records, where each record comprises related data values or items.
- Each value is formed of one or more bytes and corresponds to a particular field of the record.
- Records describe entities and their attributes.
- For example, an EMPLOYEE record signifies an employee entity, with each field value specifying an employee attribute like Name, Birth_date, Salary, or Supervisor.
- A collection of field names and their corresponding data types constitutes a record type or record format definition.
- A data type, associated with each field, specifies the types of values a field can hold.
- A field's data type is usually one of the standard data types used in programming.
- The bytes required for each data type are fixed for a given computer system.
Common Data Types and Sizes
- Integer: 4 bytes
- Long integer: 8 bytes
- Real number/floating point: 4 bytes
- Boolean: 1 byte
- Date (assuming YYYY-MM-DD): 10 bytes
- Fixed-length string: k bytes for k characters
- Variable-length strings: May require as many bytes as there are characters in each field value
Fixed-Length vs. Variable-Length Records
- A file comprises a sequence of records, often of the same record type.
- If every record has the same size (in bytes), the file consists of fixed-length records.
- When records in a file have different sizes, that file is considered to consist of variable-length records.
Reasons for Variable-Length Records
- Records are of the same type, but one or more fields vary in size (variable-length fields). For instance, the Name field of an EMPLOYEE record.
- Records are of the same type, but one or more fields may have multiple values (repeating fields), often called a repeating group.
- Records share the same type, but fields are optional, meaning some may not have values (optional fields).
- The file contains mixed record types, leading to varying record sizes, common when related records are clustered on disk blocks (e.g., GRADE_REPORT records following a STUDENT's record).
Considerations for Fixed-Length Records
- The fixed-length EMPLOYEE records have a record size of 71 bytes.
- Each record has the same fields and fixed field lengths, allowing the system to identify the starting byte position of each field relative to the record's starting position.
Handling Optional Fields
- For optional fields, all records include every field, but a special NULL value is stored when no value exists.
- For repeating fields, allocate spaces for the maximum possible occurrences of the field, which can waste space.
Handling Variable-Length Fields
- To determine each field's bytes within a record for variable-length fields, special separator characters (e.g., ?, %, $) can be used to terminate fields.
Formatting Data in Files with Optional Fields
- Optional fields can be formatted differently in files.
- For a large record type with a small number of fields appearing in a typical record, each record can include a sequence of <field-name, field-value> pairs instead of plain field values.
- A short field type code (e.g., an integer) can be assigned to each field, including a sequence of <field-type, field-value> pairs instead of <field-name, field-value> pairs for practicality.
Separating Repeating Fields
- Repeating fields need field separators (characters) for items of the field and another for termination of the field.
Indicating Different Record Types
- For files with different record types, each record is preceded by a record type indicator.
Record Allocation to Disk Blocks
- File records must be allocated to disk blocks, as a block is the unit of data transfer.
- If the block size is greater than the record size, each block contains numerous records, despite potential large records exceeding block size.
- For fixed-length records of size R bytes and a block size of B bytes, with B ≥ R, bfr = ⎣B/R⎦ records can fit per block. ⎣(x)⎦ (floor function) rounds down x to an integer.
- The value of "bfr" is called the blocking factor for the file.
- If R does not evenly divide B, the unused space in each block is B - (bfr * R) bytes.
Spanned vs. Unspanned Records
- To utilize unused space, one block can store part of a record with the rest on another block.
- A pointer at the prior block’s end points to another block holding the remainder of a disk if it isn't consecutive.
- "Spanned" organization permits records to span across blocks, required when records exceed a block's size.
- "Unspanned" organization prohibits records from crossing block boundaries, used with fixed-length records where block size exceeds record size (B > R).
- Unspanned storage simplifies record processing by ensuring each record starts at a known location within the block.
Variable Length Records Organization
- For variable-length records, either spanned or unspanned organizations are viable.
- Spanning is advantageous for large average records, reducing lost space in each block.
- With spanned organization, each block may store a different number of variable-length records.
- The blocking factor (bfr) signifies a file's average number of records per block.
- To calculate a number of blocks (b) required for 'r' records; b = ⎡(r/bfr)⎤ is the number of blocks, ⎡(x)⎤ (ceiling function) rounds up the value x to the next integer
Allocating File Blocks on Disk
- Multiple standard techniques are available for allocating file blocks on disk.
- Contiguous allocation assigns file blocks to consecutive disk blocks. Double buffering results in fast reading when allocating, but it complicates file expanding.
- Linked allocation uses a pointer in each file block for the next file block, making expanding the file easy, but causes slower whole file reading.
- Allocation combines contiguous disk block clusters that are linked together. These clusters are also called file segments or extents.
- Index allocation enables index blocks with pointers to actual file blocks, and also enables common technique combinations.
Contents of File Headers
- A file header or file descriptor includes file information needed by programs accessing records.
- The header includes file block locations, format descriptions, field lengths, field order (for fixed-length unspanned files), field type codes, & separators (for variable-length files).
- Programs copy blocks into main memory for searching and use the file header for the same.
- If the address of the block containing a desired record is unknown program must do a linear search of each block to find the record.
- A good file organization locates a block containing a desired record with a minimum block transfer.
Heap Files Explained
- Heap or pile files place the records in filing order where records added at the end of file.
- Adding new records is efficient because it has the last disk block copying into buffer, record addition, and disk rewriting. The file header has the address.
- Searching for a record through any criteria involves a linear search and is an expensive procedure.
- A program typically reads into memory and searches half the blocks until it finds the record. This search calls for (b/2) searches (on average) in a 'b' amount of blocks.
- If the search doesn't yield a record or there are multiple records satisfying the search, the program reads all 'b' blocks and searches it.
Deleting Records in Heap Files
- Program copies a block, deletes records, and rewrites disk block to start deleting a record.
- Deleting numerous records results in more space and is wasteful.
- A system includes adding extra byte/bit called a deletion marker and storing with each record. A valid/invalid value is set for each record.
- Search programs search only valid records in a block. Periodical reorganization calls for blocks accessed consecutively and packing of records.
- Space use of deleting records in new records calls for extra bookkeeping to keep track of empty storage.
Spanned vs. Unspanned in Heap Files
- Either spanned or unspanned organization or either fixed/variable-length records can get used in unordered file.
- Variable-length record mods require deleting/inserting modified record since said modified record may not fit in its old storage on disk.
- Reordering records on a field makes a copy of such record. Sorting is hard, and external sorting calls for special tactics.
Ordered (Sorted) Files
- Records ordered physically on disk (on sorting field). Sorting creates unique or sequential file.
- Fields with unique value result in sorting key of current file.
- Sorting files are more optimal than unordered file records since it first, results in efficient sorting and second, the next value calls for an additional block since next block has current block.
- Sorting files block and store over cylinders while minimizing seek time.
Limitations of Ordered Files
- Sorting does not sort random values on other un-ordered fields.
- Putting files calls for expensive moves and files.
- To add records, one has to find a position and file to do it.
- For a huge file, average operations are consuming and has to move half the file for each move.
- Issue decreases the amount of use if deletion markers are in place.
Enhancing Insertion Efficiency in Ordered Files
- One option contains unused blocks for new records though, it becomes useless over time.
- Second hand, temporary file is called a temporary or transaction file while calling master or main.
- New files insert at the last overflow file which is less important than the main. Period overflow calls for sorted and main merge.
- Overflow calls for linear if a search does not yield in main.
Considerations for Modifying Ordered Files
- Edit depends on find and file edit type.
- Given search is ordered, locate using sorting which it does not, start linear searching.
- Edit none call rewrite and fixed records on disk
- Edit the sorting name results to remove/insert the name.
- Sorting files are rarely used unless access path like the primary index. Secondary results and files get included due to sorting/ secondary files.
Introduction to Hashing Techniques
- Hashing provides fast approach by search criteria and file organization.
- It requires search of common equal to key.
- The function h or hash must find addresses.
- Search for records on block.
- Record must search by block.
Applying Hashing to Files
- Files must record to other files. Internal files use table and use a matrix.
- The matrix has a range of 0-M index.
- Transform to get value of key from M - 1. Apply h(K)= K to find record.
Dealing with Non-Integer Hash Field Values
- Non-integer hash will transform until mod applies.
- Numeric code is applied to multiplication.
- Algorithm uses codes in array.
Folding Hash Function
- Folding applies addition/ functions or excludes for calculations using function, key storage.
Digit Extraction
- Another strategy calls for picking some numbers of storage.
Challenges with Hashing Functions
- Hashing results may be hashing values not distinct code.
- Hash Value = 172. The result must call or have another record.
- Other result is a solution for resolution.
-
- Open addressing yields empty use or call for it.
-
Resolving Collisions with Chaining
- Chaining results in extends of other overflow. Also, each position to other fields calls for placements.
Resolving Collisions with Multiple Hashing
- Multiple hashing applies hash calls by results.
- A solid hash results best to minimize record call while having multiple storage sites.
External Hashing Techniques
- Hash has disk external.
- List characteristics of buckets call for multi space as result in a cluster disk bucket.
- Relate has key to number.
- File storage converts to block from bucket.
Techniques to Handle the Collision Problem
- The collision is less when buckets have the ability to store it while the overflow creates problems.
- New key stores the linked list or record in list.
- Pointer or list has block address and relative position in current block.
Limits of Static Hashing
- Hashing call access by hash values.
- Static hashing stops dynamic since key code gets sorted.
- File results calls for empty space ( M M).
- File size calls for too big( M M)
- Adjust blocks or get new hash from it.
- New file allows number for edits.
Using External Hashing Effectively
- Search key on has to call linear search when external use involved.
- Record removes for overflow replacement in the new code
- Clear overflow record.
- To edit new key on file is hard- locate then call secondary or equal for edits.
- Files key means change with other code insert.
Hash file expansion tech
- Limit code means static and hard to copy.
- Call code storage add to files
- Line requires code
- Dynamic lists database for search
Extendible Hashing
- The code allows dynamic storage with overflow. The result means type calls for index.
Indexing Structures for Files
- Auxiliary indexing structure for quick record retrieval based on specific searches
- Index types are separated on a disk that offer alternative access without causing the primary edits on file.
- Helps record search based on indexing.
- File has a field type and multiple keys
- Index structure divide in levels
Ordered Indexes
- ordered indexes match index on books with names with chapter listings
- file for records the index has names and a field.
- the index lists records by the field to that value.
- the index has an order binary
- has many forms primary index key, non key, and also cluster and secondary
Primary Indexes Explained
- Index and file has sort for key type
- Index key has order and its two fields to allow efficient searching.
- Add single info record on index block
- Each index key has its 2 fields on that key.
- Total code is equal to code on block
- First record has each file to make block code
Indexes as: Dense or Sparse
- Types can organize into Dense or Sparse.
- Each file and key that allows code into file that has its result.
- Sparse key on file is not equal to others and has space that doesn’t meet requirements.
- File type for code takes too amount of space.
- One code is smaller vs big for 2
- Therefore results and records has too many block searches.
Multilevel Indexes-Insertion/Deletion of Records
- To locate a record (with binary of log blocks and code result ) requires additional file.
- Large problems with primary results that call insertion
- Compounding with other inserts to the results also require editing.
- Records have mark of editing
Primary Indexes Calculations
- 3000 by code (B) is its data file and its no span and 100 (R) with 30,000(r). Data has how results.
- Sort the file. Use pointer = 6 code with result = 9.
Blocking Factor with One
- Block with size of (102.4/1)/ block = 10
- The result has that = 30000 and log by result equals 12
Size Code Results
- R = code sorting equals result . and logs block to get number or block to need (results equal 6 and one results is a 7)
Clustering
- Blocks need sorting from results/ code key
- If has block on key and field there a cluster is born.
- Type has cluster for retrieval result
- Also with blocks code has file type for that also disk block.
- The record has blocks and block to first to get that.
- This block that has distinct values but is no key . Therefore the field is different than the required file.
Index and Cluster Storage
- Adding/Editing still has problems because block is edited on file.
- Each value must have an equal value.
- Hashing is similar with block.
- List search key uses hash vs block searching .
Block
- Single file = storage
- File can have edit flag to has or block
- has = record on point.
- The field key block or order by records pointer can have more copies .
File key
- There can be key record on data list where you can store records for that key record
- 1 code will allow same key
- 2 Edit lists where you add code to edit the list.
- It is also common code =edit storage.
File has code levels
- Remove block
- Code to create and copy
Multileveling
- Binary finds each time and block it
- Reduce levels. Reduce copy to reduce code of file
- List calls file to second block = key . Also use insert to copy
- B-type records B- key with access with file blocks
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.