Podcast
Questions and Answers
What is the primary characteristic that distinguishes volatile storage from non-volatile storage?
What is the primary characteristic that distinguishes volatile storage from non-volatile storage?
- Volatile storage loses data when power is lost (correct)
- Volatile storage is cheaper
- Non-volatile storage is faster
- Non-volatile storage has smaller capacity
Which of the following is an example of non-volatile memory?
Which of the following is an example of non-volatile memory?
- DRAM
- CPU registers
- Cache memory
- Solid-State Drive (SSD) (correct)
What unit of retrieval is typically associated with page/block-addressable storage?
What unit of retrieval is typically associated with page/block-addressable storage?
- Bit
- Page (thousands or millions of bytes) (correct)
- Byte
- Kilobyte
Which storage technology typically offers the fastest data access?
Which storage technology typically offers the fastest data access?
What is a key difference between access patterns in volatile versus non-volatile storage?
What is a key difference between access patterns in volatile versus non-volatile storage?
Which factor most significantly contributes to the communication overhead in Network Attached HDDs?
Which factor most significantly contributes to the communication overhead in Network Attached HDDs?
Which memory technology is characterized by asymmetric read/write speeds, with reads being significantly faster?
Which memory technology is characterized by asymmetric read/write speeds, with reads being significantly faster?
In the context of memory hierarchy, what is the general trend regarding cost and capacity as you move from CPU registers to Archival Storage?
In the context of memory hierarchy, what is the general trend regarding cost and capacity as you move from CPU registers to Archival Storage?
What is the primary function of a 'Block Read' operation in the context of disk interfaces?
What is the primary function of a 'Block Read' operation in the context of disk interfaces?
What components determine the time needed to retrieve data from a spinning disk?
What components determine the time needed to retrieve data from a spinning disk?
What is a potential drawback of data prefetching?
What is a potential drawback of data prefetching?
Which term describes the process of logically marking an old page as invalid and subsequently writing the updated page out of place in SSDs?
Which term describes the process of logically marking an old page as invalid and subsequently writing the updated page out of place in SSDs?
What is the primary role of a Flash Translation Layer (FTL) in SSDs?
What is the primary role of a Flash Translation Layer (FTL) in SSDs?
Why is 'erase-before-write' characteristic a significant consideration for DBMS performance on SSDs?
Why is 'erase-before-write' characteristic a significant consideration for DBMS performance on SSDs?
What does IOPS measure in the context of SSD performance?
What does IOPS measure in the context of SSD performance?
What is the purpose of wear leveling in SSDs?
What is the purpose of wear leveling in SSDs?
What is the main advantage of using a two-level hierarchy for accessing pages of a table?
What is the main advantage of using a two-level hierarchy for accessing pages of a table?
Which characteristic distinguishes column stores from row stores in database systems?
Which characteristic distinguishes column stores from row stores in database systems?
For what kind of workloads are row stores most suitable?
For what kind of workloads are row stores most suitable?
What does a Tuple Identifier (TID) typically consist of?
What does a Tuple Identifier (TID) typically consist of?
What best describes the main goal of buffer management in a DBMS?
What best describes the main goal of buffer management in a DBMS?
In data storage, what is the significance of the 'dirty bit' associated with a page in the buffer?
In data storage, what is the significance of the 'dirty bit' associated with a page in the buffer?
In the context of buffer management, why is the Most Recently Used (MRU) replacement policy attractive for certain database operations?
In the context of buffer management, why is the Most Recently Used (MRU) replacement policy attractive for certain database operations?
What happens to valid data in a block of an SSD when that block is marked as defective?
What happens to valid data in a block of an SSD when that block is marked as defective?
What is the most significant implication of 'Write Amplification' in SSDs regarding their lifespan and performance?
What is the most significant implication of 'Write Amplification' in SSDs regarding their lifespan and performance?
In the context of data retrieval from a disk, consider a scenario where the disk head is already positioned over the correct cylinder. Which delay component will dominate the data access time?
In the context of data retrieval from a disk, consider a scenario where the disk head is already positioned over the correct cylinder. Which delay component will dominate the data access time?
In the context of in-page directory for variable-length tuples, which action needs to occur if records can be moved around within a page to eliminate empty space?
In the context of in-page directory for variable-length tuples, which action needs to occur if records can be moved around within a page to eliminate empty space?
What is the primary challenge addressed by Flash Translation Layer (FTL) in the context of SSD technology?
What is the primary challenge addressed by Flash Translation Layer (FTL) in the context of SSD technology?
In a database system employing variable-length records, how does storing an offset array in the header for each tuple facilitate attribute access?
In a database system employing variable-length records, how does storing an offset array in the header for each tuple facilitate attribute access?
In a sorted table, what is the most significant challenge with maintaining physical order if a new tuple must be inserted in the middle of a data page?
In a sorted table, what is the most significant challenge with maintaining physical order if a new tuple must be inserted in the middle of a data page?
What is a key factor influencing the decision between using a heap table versus a sorted table?
What is a key factor influencing the decision between using a heap table versus a sorted table?
In buffer management, what happens when the current data page is required, but there are no empty buffer frames available?
In buffer management, what happens when the current data page is required, but there are no empty buffer frames available?
Considering the functions of a Flash Translation Layer (FTL), which functionality directly contributes to extending the lifespan of an SSD by managing write operations?
Considering the functions of a Flash Translation Layer (FTL), which functionality directly contributes to extending the lifespan of an SSD by managing write operations?
An engineer is designing a database that stores sensor readings, which are collected at various frequencies and must support time-series analysis. Given that the sensor readings vary in the size and the database will be read more often than written, which store is more suitable for optimal performance?
An engineer is designing a database that stores sensor readings, which are collected at various frequencies and must support time-series analysis. Given that the sensor readings vary in the size and the database will be read more often than written, which store is more suitable for optimal performance?
When designing a new database system, what is the most critical trade-off to consider when deciding whether to use fixed-length versus variable-length records?
When designing a new database system, what is the most critical trade-off to consider when deciding whether to use fixed-length versus variable-length records?
Suppose a database system crashes. After rebooting, how does the system use system catalogs to ensure transactional consistency and integrity?
Suppose a database system crashes. After rebooting, how does the system use system catalogs to ensure transactional consistency and integrity?
Which of the following is not typically stored in a system catalog?
Which of the following is not typically stored in a system catalog?
What does sequential performance in SSD relies almost entirely on?
What does sequential performance in SSD relies almost entirely on?
What causes garbage in not writing in-place operations?
What causes garbage in not writing in-place operations?
Which of the following storage types loses its data when power is removed?
Which of the following storage types loses its data when power is removed?
Which of the following best describes the data access speed of volatile storage compared to non-volatile storage?
Which of the following best describes the data access speed of volatile storage compared to non-volatile storage?
Which of the following is the most accurate consideration when designing a database system about cost and capacity?
Which of the following is the most accurate consideration when designing a database system about cost and capacity?
When the location of the requested data on a spinning disk is known, but the disk head is not positioned over the correct sector, which factor most influences the data access time?
When the location of the requested data on a spinning disk is known, but the disk head is not positioned over the correct sector, which factor most influences the data access time?
What is the primary role of 'wear leveling' in Solid State Drives (SSDs)?
What is the primary role of 'wear leveling' in Solid State Drives (SSDs)?
Which of the following occurs during 'garbage collection' in SSDs?
Which of the following occurs during 'garbage collection' in SSDs?
In the context of data retrieval from a spinning disk, what is represented by 'seek time'?
In the context of data retrieval from a spinning disk, what is represented by 'seek time'?
What is a notable advantage of storing data in column stores compared to row stores?
What is a notable advantage of storing data in column stores compared to row stores?
In database systems, what information does the 'dirty bit' associated with a page in the buffer indicate?
In database systems, what information does the 'dirty bit' associated with a page in the buffer indicate?
Which memory technology exhibits asymmetric read/write speeds, with reads generally being faster than writes?
Which memory technology exhibits asymmetric read/write speeds, with reads generally being faster than writes?
Which statement is most accurate regarding the characteristics of SSDs?
Which statement is most accurate regarding the characteristics of SSDs?
What is the primary reason for employing a Flash Translation Layer (FTL) in Solid State Drives (SSDs)?
What is the primary reason for employing a Flash Translation Layer (FTL) in Solid State Drives (SSDs)?
What is the most significant reason memory access patterns in a DBMS are typically more predictable than those in a general-purpose operating system?
What is the most significant reason memory access patterns in a DBMS are typically more predictable than those in a general-purpose operating system?
In the context of handling variable-length records within a database page, what is the purpose of storing an offset array in the header for each tuple?
In the context of handling variable-length records within a database page, what is the purpose of storing an offset array in the header for each tuple?
Which of the following is the primary performance benefit of using a two-level hierarchy (directory of pages) for accessing data in a large table?
Which of the following is the primary performance benefit of using a two-level hierarchy (directory of pages) for accessing data in a large table?
What is the function of 'pinning' a page in the buffer pool?
What is the function of 'pinning' a page in the buffer pool?
What is a significant implication of the 'erase-before-write' characteristic of SSDs for database management systems (DBMS)?
What is a significant implication of the 'erase-before-write' characteristic of SSDs for database management systems (DBMS)?
When designing a database system, what is the main trade-off to consider when choosing between fixed-length versus variable-length records?
When designing a database system, what is the main trade-off to consider when choosing between fixed-length versus variable-length records?
What is the greatest challenge in maintaining a sorted table if a new tuple must be inserted into the middle of a data page while preserving physical order?
What is the greatest challenge in maintaining a sorted table if a new tuple must be inserted into the middle of a data page while preserving physical order?
How do system catalogs enable a database system to guarantee transactional consistency and integrity after a crash?
How do system catalogs enable a database system to guarantee transactional consistency and integrity after a crash?
In the context of SSDs, what is the 'Maximum Terabytes Written' (TBW) rating?
In the context of SSDs, what is the 'Maximum Terabytes Written' (TBW) rating?
Which factor most significantly contributes to the effect of communication overhead in Network Attached DRAM?
Which factor most significantly contributes to the effect of communication overhead in Network Attached DRAM?
An engineer is designing a system to store frequently accessed configuration files given the need for rapid retrieval; which storage option would be most appropriate?
An engineer is designing a system to store frequently accessed configuration files given the need for rapid retrieval; which storage option would be most appropriate?
In terms of data placement, what is true if multiple databases concurrently access different files on a disk?
In terms of data placement, what is true if multiple databases concurrently access different files on a disk?
Which functionality of a Flash Translation Layer (FTL) contributes most directly to extending the lifespan of an SSD?
Which functionality of a Flash Translation Layer (FTL) contributes most directly to extending the lifespan of an SSD?
Why is a high number of write operations, more than are needed, negatively affect the lifespan and performance of SSDs?
Why is a high number of write operations, more than are needed, negatively affect the lifespan and performance of SSDs?
When retrieving data form memory, describe what is meant by byte-addressability with regard to storage technologies.
When retrieving data form memory, describe what is meant by byte-addressability with regard to storage technologies.
How does FTL address the blocks in SSDs once a block is recognized to be faulty?
How does FTL address the blocks in SSDs once a block is recognized to be faulty?
What metadata is stored in system catalogs about tables?
What metadata is stored in system catalogs about tables?
How does the most recently used (MRU) policies attract for DBMS implementations?
How does the most recently used (MRU) policies attract for DBMS implementations?
When are column stores more suitable than row stores? What kind of analysis is best?
When are column stores more suitable than row stores? What kind of analysis is best?
How should the free space be handled if there are resulting deletions of tuples in the memory space?
How should the free space be handled if there are resulting deletions of tuples in the memory space?
In SSDs, there are parallel executions based on different tiers. What are multiple packages per channel, multiple dies per package, and multiple planes per die known as?
In SSDs, there are parallel executions based on different tiers. What are multiple packages per channel, multiple dies per package, and multiple planes per die known as?
Considering the architecture of modern SSDs, if a database system executes a query that requires accessing a significant number of pages scattered across different channels and dies, which form of parallelism would provide the most substantial performance benefits in reducing latency?
Considering the architecture of modern SSDs, if a database system executes a query that requires accessing a significant number of pages scattered across different channels and dies, which form of parallelism would provide the most substantial performance benefits in reducing latency?
A database system must implement a table where the order of tuples should be based on insertion times. What kind of table order should the system use?
A database system must implement a table where the order of tuples should be based on insertion times. What kind of table order should the system use?
You are designing a high-performance database system which frequently does random writes to the database. What is a critical strategy that you should use when selecting SSDs?
You are designing a high-performance database system which frequently does random writes to the database. What is a critical strategy that you should use when selecting SSDs?
Of the following options, which type of memory would likely have the lowest latency?
Of the following options, which type of memory would likely have the lowest latency?
Describe the data storage attributes for Solid State Drives (SSDs).
Describe the data storage attributes for Solid State Drives (SSDs).
How are data transfer operations to SSDs performed?
How are data transfer operations to SSDs performed?
How do solid-state drives improve read speed?
How do solid-state drives improve read speed?
What is a general component of SSD controllers?
What is a general component of SSD controllers?
You want to build a file storage for archival. What type of disk type is best?
You want to build a file storage for archival. What type of disk type is best?
How to FTL uses in-memory buffering?
How to FTL uses in-memory buffering?
Assume that the page to exit the buffer as decided by the buffer replacement policy is Page q in buffer frame c, and suppose q can be discarded to free the buffer page as the data has not been changed since loaded. What is a common term that implies this?
Assume that the page to exit the buffer as decided by the buffer replacement policy is Page q in buffer frame c, and suppose q can be discarded to free the buffer page as the data has not been changed since loaded. What is a common term that implies this?
What is a fundamental advantage of storing data in a Solid State Drive (SSD) over a traditional Hard Disk Drive (HDD)?
What is a fundamental advantage of storing data in a Solid State Drive (SSD) over a traditional Hard Disk Drive (HDD)?
In the context of data storage, what primarily constitutes the seek time in a spinning hard disk?
In the context of data storage, what primarily constitutes the seek time in a spinning hard disk?
What is the primary function of the Flash Translation Layer (FTL) when accessing SSDs?
What is the primary function of the Flash Translation Layer (FTL) when accessing SSDs?
What is the main benefit of employing a two-level hierarchy for accessing data pages in a large table?
What is the main benefit of employing a two-level hierarchy for accessing data pages in a large table?
What determines the suitability of column stores over row stores for specific database workloads?
What determines the suitability of column stores over row stores for specific database workloads?
In the context of SSDs, what does 'wear leveling' primarily aim to achieve?
In the context of SSDs, what does 'wear leveling' primarily aim to achieve?
If a tuple in a database is updated and now has a different size than before, what is the primary action that needs to occur in a data page using variable-length tuples?
If a tuple in a database is updated and now has a different size than before, what is the primary action that needs to occur in a data page using variable-length tuples?
In the context of buffer management, what does a high 'pin count' on a page indicate?
In the context of buffer management, what does a high 'pin count' on a page indicate?
When designing a database system, what is the main advantage of using fixed-length records over variable-length records?
When designing a database system, what is the main advantage of using fixed-length records over variable-length records?
In the context of SSD technology and data management, what process is initiated by Flash Translation Layer (FTL) to reclaim space occupied by outdated or invalid pages?
In the context of SSD technology and data management, what process is initiated by Flash Translation Layer (FTL) to reclaim space occupied by outdated or invalid pages?
What is a key difference between 2.5-inch SATA SSDs and M.2 NVMe SSDs in terms of performance?
What is a key difference between 2.5-inch SATA SSDs and M.2 NVMe SSDs in terms of performance?
Consider a relational database undergoing frequent updates and insertions. Which strategy balances the need for efficient space usage with maintaining the physical order of tuples?
Consider a relational database undergoing frequent updates and insertions. Which strategy balances the need for efficient space usage with maintaining the physical order of tuples?
What is the primary reason for the recommendation against using Solid State Drives (SSDs) for long-term archival storage, especially for infrequently accessed data?
What is the primary reason for the recommendation against using Solid State Drives (SSDs) for long-term archival storage, especially for infrequently accessed data?
In the context of database storage and retrieval, how does the implementation of clustered tables impact join operations involving two tables?
In the context of database storage and retrieval, how does the implementation of clustered tables impact join operations involving two tables?
In the design of database systems, what would be a significant trade off when choosing between row stores vs column stores for a database?
In the design of database systems, what would be a significant trade off when choosing between row stores vs column stores for a database?
Why isn't attempt 1, keeping track of locations of empty slots, for fixed-length tuples good? Refer to the diagram on the uploaded document.
Why isn't attempt 1, keeping track of locations of empty slots, for fixed-length tuples good? Refer to the diagram on the uploaded document.
In a database system employing heap table organization, how does the system typically manage the insertion of a new tuple when the available space in a given data page is insufficient?
In a database system employing heap table organization, how does the system typically manage the insertion of a new tuple when the available space in a given data page is insufficient?
Design Challenge: A database system needs to decide between adding metadata field with number of tuples in the system to know if the page is full or to add pointer to the next empty slot in the system. In terms of space of storage, which option performs best?
Design Challenge: A database system needs to decide between adding metadata field with number of tuples in the system to know if the page is full or to add pointer to the next empty slot in the system. In terms of space of storage, which option performs best?
Suppose that a database system is designed to have a table where deletion entries occur very often. Describe what is meant by handling free space and what approaches would perform well given such a pattern.
Suppose that a database system is designed to have a table where deletion entries occur very often. Describe what is meant by handling free space and what approaches would perform well given such a pattern.
A database system uses storage disks that are partially failing. You are tasked to re-implement FTL's handling of defective blocks. Describe what measures you need to consider upon re-implementing FTL.
A database system uses storage disks that are partially failing. You are tasked to re-implement FTL's handling of defective blocks. Describe what measures you need to consider upon re-implementing FTL.
Flashcards
Volatile Storage
Volatile Storage
Data is lost when power is not connected.
Non-volatile Storage
Non-volatile Storage
Data is retained even when power is off.
Page/block-addressable storage
Page/block-addressable storage
HDDs, SSDs use it. Unit of retrieval is a page, which is typically a block of 1,000s or millions of bytes
Seek Time
Seek Time
Signup and view all the flashcards
Rotational Delay
Rotational Delay
Signup and view all the flashcards
Flash Translation Layer (FTL)
Flash Translation Layer (FTL)
Signup and view all the flashcards
Wear leveling
Wear leveling
Signup and view all the flashcards
Main functions of FTL
Main functions of FTL
Signup and view all the flashcards
FTL Out-of-Place Update
FTL Out-of-Place Update
Signup and view all the flashcards
FTL in-memory buffering
FTL in-memory buffering
Signup and view all the flashcards
FTL Garbage Collection
FTL Garbage Collection
Signup and view all the flashcards
FTL's Wear Leveling Mechanism
FTL's Wear Leveling Mechanism
Signup and view all the flashcards
FTL Handling of Defective Blocks
FTL Handling of Defective Blocks
Signup and view all the flashcards
FTL's Parallel Execution
FTL's Parallel Execution
Signup and view all the flashcards
Maximum Terabytes Written (TBW)
Maximum Terabytes Written (TBW)
Signup and view all the flashcards
HD Technology
HD Technology
Signup and view all the flashcards
SSD Technology
SSD Technology
Signup and view all the flashcards
Two-level Hierarchy
Two-level Hierarchy
Signup and view all the flashcards
Row Store
Row Store
Signup and view all the flashcards
Column Store
Column Store
Signup and view all the flashcards
Row Store (+ve)
Row Store (+ve)
Signup and view all the flashcards
Column Store (-ve)
Column Store (-ve)
Signup and view all the flashcards
Column Store (+ve)
Column Store (+ve)
Signup and view all the flashcards
Row Store
Row Store
Signup and view all the flashcards
Column Store
Column Store
Signup and view all the flashcards
Fixed-length Records
Fixed-length Records
Signup and view all the flashcards
Tuple Identifier
Tuple Identifier
Signup and view all the flashcards
Important property of a Tuple Identifier
Important property of a Tuple Identifier
Signup and view all the flashcards
Variable-length Records
Variable-length Records
Signup and view all the flashcards
Data Page Layout containing Fixed-length Tuples
Data Page Layout containing Fixed-length Tuples
Signup and view all the flashcards
Data Page Layout Containing Variable-length Tuples :Slot-id
Data Page Layout Containing Variable-length Tuples :Slot-id
Signup and view all the flashcards
How to know data exist in a page for Variable length tuple to insert
How to know data exist in a page for Variable length tuple to insert
Signup and view all the flashcards
Heap table
Heap table
Signup and view all the flashcards
How to Store Attribute Values Inside a Tuple? Tuples with Fixed-length Attributes
How to Store Attribute Values Inside a Tuple? Tuples with Fixed-length Attributes
Signup and view all the flashcards
How to Store Attribute Values Inside a Tuple? Tuples with Variable-length Attributes
How to Store Attribute Values Inside a Tuple? Tuples with Variable-length Attributes
Signup and view all the flashcards
Sorted Table Order
Sorted Table Order
Signup and view all the flashcards
Hash function takes
Hash function takes
Signup and view all the flashcards
Cluster Tables
Cluster Tables
Signup and view all the flashcards
System Catalogs
System Catalogs
Signup and view all the flashcards
Why DBMS different in buffer than OS
Why DBMS different in buffer than OS
Signup and view all the flashcards
Buffer memory
Buffer memory
Signup and view all the flashcards
Buffer Replacement Policies MRU
Buffer Replacement Policies MRU
Signup and view all the flashcards
Data Structure: PinCount
Data Structure: PinCount
Signup and view all the flashcards
Data Structure: DirtyBit
Data Structure: DirtyBit
Signup and view all the flashcards
Data Structure: Clean Page
Data Structure: Clean Page
Signup and view all the flashcards
Study Notes
Storage, Data Organization, and Buffering: Storage Technologies
- Key areas include storage technologies and memory hierarchy, non-volatile storage, page data access, row and column stores and buffer management.
Volatile vs. Non-Volatile Storage Media
- Volatile storage loses data when power is disconnected, while non-volatile retains data.
- Volatile storage uses byte-addressable access with retrieval of one byte or cache lines. Non-volatile storage uses page/block-addressable access. NVM is byte addressable storage.
- Volatile storage is faster with nano-second speeds, while non-volatile is slower at micro/milli-seconds.
- Volatile is more expensive, but smaller in capacity versus non-volatile. Examples include CPU registers, cache memory, and DRAM. Non-volatile examples are spinning hard disks, SSDs, NVM and PM.
- Volatile access is random and sequential, with the same read/write latency, while non-volatile Sequential access is typically faster.
- NVM read latency is 4x faster than write. SSD read latency is 12x faster than write, and HD has symmetric latency.
Various Storage Technologies
- Cache, DRAM, NVM, SSD, and HD.
- Cache has the fastest read and write latency at 1-15 ns. HD is slowest with 1-10 ms.
- CPU registers, L1, L2 and L3 also have different read latencies. L1 cache is 1ns, L2 is 5ns and L3 is 15ns.
- Cache and DRAM are volatile, while NVM, SSD and HD are non-volatile.
- All are byte addressable except SSD and HD, which use Block/Page addressing.
- Capacity increases from Cache (10's KB - 10's MB) to HD (TBs). DRAMS range from GBs-TBs and NVM/SSD support TBs.
Communication Overhead on Disk Storage
- Hard disks and Network Attached HD.
- Both use Milliseconds for access latency, but the Network Attached HD are 50s (Varies with size).
- Both are non-volatile and use Block/Page addressing and supports TBs.
Communication Overhead on Memory Storage
- DRAM and Network Attached DRAM uses nanoseconds and microseconds respectively. DRAM has 50ns and Network Attached DRAM is 10Ms.
- Volatility and addressability are the same, but DRAM bandwidth caps out in the GBs-TBs range, while Network Attached DRAM is TBs.
Memory Hierarchy
- Highest capacity is Archival Storage/tapes, then Network Storage, then Disk Storage and Solid-State Disks.
- Main Memory, Cache Memory and CPU registers are placed at the bottom. Price and Speed also increase from the top to the bottom.
- Archival, Network and Disk storage use page-oriented access while Solid-State Disks uses Non-Volatile access.
- Main and Cache memory use mini-block access using cache lines, while CPU registers use byte-oriented access.
Non-Volatile Storage: Spinning Hard Disks
- Spinning hard and solid-state drives use Disks and non-volatile storage. Volatile buffers are used, and access is page based.
- Seek time is the time to move heads and is around 10 ms.
- Rotational delay is the time to rotate around the spindle is around 10 ms.
- Transmission delay is to transfer a block of data at around 1ms.
- Each disk page has a disk block id.
- Block Read(disk block id) transmits the corresponding disk page from the disk to the memory buffers.
- Write(Block, disk block id) Transmits the data Block from the memory buffers into the disk page corresponding to the given disk block id.
- Disk Block Identifier gets mapped into Cylinder (or Track) Number, Platter Number, Sector Number.
- Retrieving disk page is dependent on on the current location of the disk head, and the location of the requested page. Involves mechanical movement and hence is relatively slow
- Seek Time has the Head move to the corresponding cylinder number, wait until head is on top of the correct Sector Number in Rotational Delay. Once head is at the right cylinder, right sector, direct and give the signal to the head responsible or the block's Platter Number is Transmission Time
- The Data Placement Problem involves placing data blocks on disk to minimize file read time using p platters, s sectors per track. One consideration is storing d₁ in Cylinder c₁, Platter p₁, Sector S₁
- The ideal data write pattern is d2→ C1, P2, S1 (Seek Time = 0, Rotational Delay = 0)
- Page prefetching retrieves pages before being requested, which may save time. It can be a wasted effort if the page is not requested.
- Good data placement will result in lower retrieval cost of a page.
- Multi-tenant databases can make data placement work difficult, because requests for pages from multiple files will be interleaved.
Non-Volatile Storage: Flash Memory and Solid-State Drives
- Examples of flash memory include ROM, PROM, EPROM, EEPROM and Flash Memory.
- Flash Memory is 1999 (Toshiba): based on EEPROM technology, uses gates to store bits or bubble memory. EEPROM (late 1970s, early 1980's- Hughes and Intel): Electrically Erasable Programmable Read-only Memory and is User-modifiable ROM
Solid-State Drive (SSD)
- NV storage that persists data on solid-state flash memory. Replace traditional hard disk drives (HDDs) in computers performing the same basic functions as a hard disk drives.
- SSDs do not have mechanical moving parts like HDDs. They are faster than HDDs.
- SSDs use NAND Flash memory to store data with non-volatile NAND transistors. They are arrayed and stacked on circuit boards called 3D NAND. Stacks provide higher storage capacities as memory cells are stacked.
- A Single-level cell or SLC are the most expensive, but most durable. Adding levels of cells increase the storage capacity but reduces reliability.
- SSD Controller communicate between NAND Flash and input/output interface.
- Manages all the flash memory cells by telling them what memory to access or manipulate, and guarantees data distribution and garbage collection.
Solid-State Drive Controller Behavior
- There is garbage generated due to not writing in-place when the old block becomes garbage and needs to be collected. The blocks read into memory and are updated.
- 2.5-Inch SATA SSD (Internal SSD) uses Serial Advanced Technology for Old interface for SSDs andHDDs also use the SATA standard. These are compatible with HDDs of the same size.
- M.2 SATA SSD (External SSD) is similar to the 2.5-Inch HDDs that also use the SATA standard and are built into USB Thumb drives.
- M.2 NVMe SSD (mainly, internal) uses Non-Volatile Memory Express using PCI Express bandwidth (PCIe) and not SATA bandwidth. The PCle is a common high-speed connection interface on the motherboard which achieves huge performance improvement.
Solid-State Drive Sequential Performance
- SATA SSDs provide up to 0.5 GB/s for sequential reads and writes and is the slowest of the three types
- NVMe Gen3 SSDs range between 1.5GB/s - 3.5 GB/s.
- With At this speed, CPU begin to bottleneck not the storage
- NVMe Gen4 SSDs achieve 5.5GB/s - 7GB/s for sequential reads and writes.
Solid-State Drive Durability
- The standard unit of measurement for the maximum number of reads and writes to non-contiguous storage locations or IOPS.
- Data is stored in blocks and blocks are composed of pages. Pages or blocks must be erased first. This erasure is performed at the Block level rather than the Page level.
- If a block is erased, we can write one page at a time as long as we are writing on a blank page that has not been written into before after the most recent block erase, meaning pages can be read independently.
- Block erase time is slow and to improve erase time, when you update a page, the old page in a block is marked outdated in order for the updated page to be written into a new page out of place in another block.
- Blocks are not durable because writting/erasing causes the SSD to wear. They get damaged over time after a certain number of erases or writes.
- Most SSDs come with integrated "wear-leveling" technology, to Evenly distribute the wear out extends lifespan of device with Error Correction Code page.
- In Solid-state drives, the Erasure is before each data write, and they are subject to wearing out because reads are faster than writes.
Solid-State Drive Access Types
- Implementations can be either Raw Flash which directly handle the NAND flash drive using in-memory write buffers to postpone batch write operations with Apply logging techniques to decrease number of random writes.
- Another is is Flash Translation Layer (FTL-based), a software driver in the SSD controller Enforces an out-of-place policy with use Logical address mapping that Addresses wear leveling, Reclaims free space with Managed bad blocks.
Solid-State Drive FTL Management
- Main functions of FTL or address mapping, wear leveling, garbage collection and bad block management,.
- FTL implements an out-of-place update policy because Flash cells cannot be updated in-place so old pages are marked as invalid while the updated version is written to a new location. There are three mapping methods of doing this. Page-level, block-level, and hybrid.
- FTL uses in-memory buffering to buffer page updates reducing the number of page writes which reduces block erases. The in-memory approach could result in loss of data due to volatility.
- Out-of-place updates also results in out-of-date/invalid flash pages which triggers Garbage Collection.
- FTL runs asynchronous garbage collector to recycle out-of-date pages.
- FTL's wear leveling avoids premature aging by distributing block erasures uniformly across all of SSD
- the FTL handles defective blocks marking them as defective with the data copied to other blocks.
- Flash supports multiple packages per channel, multiple dies per package, and multiple planes per die enabling Channel, Package, Die level and Plane Parallelism.
Maximum Terabytes Written (TBW)
- As data is written into SSDs and read, its effectiveness steadily decreases with age, which is quantified as Maximum Terabytes Written (TBW) of an SSD.
Hard Drives vs Solid State Drives
- HD uses spinning magnetic platters, while SSD uses flash memory.
- HD capacities are 250GB to 20TB, vs. SSD's 128GB to 8TB & 30TB to 100TB SSDs.
- Speed for HDs are way down at 500 MB/s, while SSDs are up to 7000 MB/s. SSD<25-33% HD in terms of power consumption.
- HDs have better long term storage, with external HDs are more portable than internal HDs, while SSD are less reliable for the long term with external SSDs are more portable than internal SSDs.
- HDds are cheaper, starting at 24 for 500GB, vs. 55 dollars for SSD, but SSD offer better perofmrance with fast data retrieval and constant use and not not ideal for long term storage.
Tables and Directories
- Pages of a table can be accessed if No indexes are available or There is a linked list of disk pages.
- A Directory table contains the Table name that corresponds with the Head of linked list of pages.
- Each page has forward and backward pointers using the disk page ids, offering Sequential access
- Maintaining a two-level hierarchy uses directory pages.
Row vs Column Stores Intro
- Table data is stored one tuple at a time or row adheres to the same schema. Adds and rewrites are easy.
- Relational databases are mainly stored in row-major storage, storing data in row order on the storage media, easy
- Suited for mostly transactional Workloads (OLTP)
- To compute
select sum(salary) from Employee
, will access all the data in the Employee table but will access only the salary attribute from each record and will Need to access more data than needed. - Column stores are in column major order with dat of a certain attribute like salary stored together. They can be compressed and are Suited for mostly reads analytics workloads(OLAP).
- With Column stores, queries will only touch the relevant data like average sale, but when inserting a tuple, will need to touch many locations.
Row and Column Stores
-
Tuple attributes (rows) are stored together in same block.
-
Uses suitable in transactional processing specific to a tuple like increase Mary's salary by 5% .
-
Attributes of Mary's name and salary will be contiguous in storage.
-
Column stores can have columns of attributes are stored together in the same block, and suitable for analytical processing on certain columns can be quickly compute average salaries. All salary attributes of the entire table will be contiguous in storage (fast).
Tuples Types
- Tuples can be of 2 types: fixed or variable length for all tuples in a table.
Tuple Identifiers (TID)
- How to Refer to a Tuple in a Table as Tuple Identifier or a Record Identifier.
- Used to physically locate tuple Data page id (of the page that contains the tuple), or the index or offset.
- Used in Foreign-key/ primary-key to avoid performing joins.
- TIDs are used in the leaf level of an index like B-tree.
- It must uniquely identif and potentially not change if Tuple data is updated.
Data Page Layout
-
Fixed Length: There is keeping track of empty slots or using a linked list so it should not change TID and do compaction which change the TID. This may involved having pointers to previous/next page with a linear slot search.
-
Variable Length: There is directory of the pages, if tuple moves in the page, no change but in directory changes. But this will take space in memory.
Problems in Handling Tuple Space
- For inserting a new tuple? Use Free Space Pointer and Free Space Size with constant Update of pointer with Update of Table.
- Handling Gaps after tuples are deleted, requires gaps be inserted or re-relocation inside same directory.
- What to do if there is a lot of data with many tuples. Can create two level entries for data and what happens if to sequential check entire database.
- So what is better for Free Space? Keep directory for tracking.
Heap Data Structures
- Maintain two doubly-linked lists of pages in the table. A separate List for without Data and with Data. Then if Tuple N Data is to insert search page of N.
Storing Attribute Values
- Having a Store Tuple is related to have Fixed Length in which case attribute must be fixed. Then we need to locate the offset at all attribute like Name, Position and etc. Or if is page it is not a good plan.
- In this case will have to store Data inside catalog and do some other functions and code to help. This will be stored into System Catalog.
- Now a Variable we use offset array for length, if = 0 is null
Tables and Index
- The system catalog maintains data about table such as the directory of the page as well as if index or unique.
- What to do if Tuple in a sort? Physical in data changes of existing. And then to avoid change then insert with Linked List.
- Table can be Entire is inside an index?
- Hash take tuple to a location then is B data search for data there for keys. If page splits tuples do not change.
- Tables with join are considered as a cluster.
Organizing, and buffering: System catalogs stores meta data about the tables in the database.
- System catalogs stores meta-data about the tables in the database. For every table it stores the directory for data-pages composing, How it's stored and its Schema including fixed/variables, plus whether there are primary or unique keys. Some data will exist in the offset
System Catalog Tables themselves are stored in the System Catalogs.
- Stored are Index type, table views, user view, encrypted passwords for rights the rights to which tables.
Attributes_Catalog_Table( ) with the relation_name, attribute_name, attribute_type, attribute_order_in_table.
Buffer Management
-
Buffer management in the DBMS different because access is pre predictive instead of LRU in OS.
-
Most accesses goes to relational operators.
-
Data must loaded into Data Store for memory queries to operate. Target to reduce between Data store memory(Buffer Manager) increasing buffer hist reducing buffer misses.
-
Dirty bit, copy in the page or that data store and Pincount of data.
Functions of the Buffer Manager are in the buffer not in the Provide a empty store. Is is needed not if the and that and has 2 page or more. Buffer Replacement can be set not. If or is dirty clean up clean.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.