Slide 7 - Data Storage - V2

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary characteristic that distinguishes volatile storage from non-volatile storage?

  • Volatile storage loses data when power is lost (correct)
  • Volatile storage is cheaper
  • Non-volatile storage is faster
  • Non-volatile storage has smaller capacity

Which of the following is an example of non-volatile memory?

  • DRAM
  • CPU registers
  • Cache memory
  • Solid-State Drive (SSD) (correct)

What unit of retrieval is typically associated with page/block-addressable storage?

  • Bit
  • Page (thousands or millions of bytes) (correct)
  • Byte
  • Kilobyte

Which storage technology typically offers the fastest data access?

<p>Cache memory (A)</p> Signup and view all the answers

What is a key difference between access patterns in volatile versus non-volatile storage?

<p>Sequential access is faster in Non-Volatile Storage (B)</p> Signup and view all the answers

Which factor most significantly contributes to the communication overhead in Network Attached HDDs?

<p>Network latency (D)</p> Signup and view all the answers

Which memory technology is characterized by asymmetric read/write speeds, with reads being significantly faster?

<p>SSD (B)</p> Signup and view all the answers

In the context of memory hierarchy, what is the general trend regarding cost and capacity as you move from CPU registers to Archival Storage?

<p>Cost decreases, capacity increases (A)</p> Signup and view all the answers

What is the primary function of a 'Block Read' operation in the context of disk interfaces?

<p>To transmit a disk page from the disk to the memory buffers. (D)</p> Signup and view all the answers

What components determine the time needed to retrieve data from a spinning disk?

<p>Seek time, rotational delay, and transmission delay. (A)</p> Signup and view all the answers

What is a potential drawback of data prefetching?

<p>It may waste effort if prefetched pages are not subsequently requested (D)</p> Signup and view all the answers

Which term describes the process of logically marking an old page as invalid and subsequently writing the updated page out of place in SSDs?

<p>Out-of-place update (D)</p> Signup and view all the answers

What is the primary role of a Flash Translation Layer (FTL) in SSDs?

<p>To manage bad blocks and perform wear leveling (D)</p> Signup and view all the answers

Why is 'erase-before-write' characteristic a significant consideration for DBMS performance on SSDs?

<p>It requires entire blocks to be erased before writing, affecting write speeds (B)</p> Signup and view all the answers

What does IOPS measure in the context of SSD performance?

<p>The maximum number of reads and writes to non-contiguous storage locations per second (B)</p> Signup and view all the answers

What is the purpose of wear leveling in SSDs?

<p>To evenly distribute write and erase cycles across memory cells. (A)</p> Signup and view all the answers

What is the main advantage of using a two-level hierarchy for accessing pages of a table?

<p>It reduces sequential access overhead for large tables (B)</p> Signup and view all the answers

Which characteristic distinguishes column stores from row stores in database systems?

<p>Column stores store data for certain attributes together. (B)</p> Signup and view all the answers

For what kind of workloads are row stores most suitable?

<p>Mostly transactional workloads (OLTP) (A)</p> Signup and view all the answers

What does a Tuple Identifier (TID) typically consist of?

<p>Data page ID and the index/offset inside the data page (C)</p> Signup and view all the answers

What best describes the main goal of buffer management in a DBMS?

<p>Reducing the amount of data transferred between data store and memory (A)</p> Signup and view all the answers

In data storage, what is the significance of the 'dirty bit' associated with a page in the buffer?

<p>Indicates whether the page has been modified since being read from the data store (A)</p> Signup and view all the answers

In the context of buffer management, why is the Most Recently Used (MRU) replacement policy attractive for certain database operations?

<p>Database operations such as table scan visit the pages not to be revisited soon, thus evicting the most recently used is more efficient (C)</p> Signup and view all the answers

What happens to valid data in a block of an SSD when that block is marked as defective?

<p>Its valid data is copied to another functional block (B)</p> Signup and view all the answers

What is the most significant implication of 'Write Amplification' in SSDs regarding their lifespan and performance?

<p>It causes more data to be written than was initially intended, accelerating wear. (C)</p> Signup and view all the answers

In the context of data retrieval from a disk, consider a scenario where the disk head is already positioned over the correct cylinder. Which delay component will dominate the data access time?

<p>Rotational Delay and Transmission Delay (A)</p> Signup and view all the answers

In the context of in-page directory for variable-length tuples, which action needs to occur if records can be moved around within a page to eliminate empty space?

<p>The entry in the header containing the offset of the tuple must be updated (C)</p> Signup and view all the answers

What is the primary challenge addressed by Flash Translation Layer (FTL) in the context of SSD technology?

<p>Mitigating the limitations of flash memory like write endurance and erase block constraints (B)</p> Signup and view all the answers

In a database system employing variable-length records, how does storing an offset array in the header for each tuple facilitate attribute access?

<p>It allows for direct calculation of attribute positions without scanning the tuple. (C)</p> Signup and view all the answers

In a sorted table, what is the most significant challenge with maintaining physical order if a new tuple must be inserted in the middle of a data page?

<p>Invalidation of existing tuple identifiers (TIDs) due to shifting tuples (A)</p> Signup and view all the answers

What is a key factor influencing the decision between using a heap table versus a sorted table?

<p>Requirement for fast search and ordered retrieval of the tuples. (A)</p> Signup and view all the answers

In buffer management, what happens when the current data page is required, but there are no empty buffer frames available?

<p>A suitable page is removed according to a replacement policy (B)</p> Signup and view all the answers

Considering the functions of a Flash Translation Layer (FTL), which functionality directly contributes to extending the lifespan of an SSD by managing write operations?

<p>Addresses wear leveling (A)</p> Signup and view all the answers

An engineer is designing a database that stores sensor readings, which are collected at various frequencies and must support time-series analysis. Given that the sensor readings vary in the size and the database will be read more often than written, which store is more suitable for optimal performance?

<p>Column store is more suitable (B)</p> Signup and view all the answers

When designing a new database system, what is the most critical trade-off to consider when deciding whether to use fixed-length versus variable-length records?

<p>Space management overhead, performance, and storage efficiency versus schema flexibility and efficient handling of missing values (C)</p> Signup and view all the answers

Suppose a database system crashes. After rebooting, how does the system use system catalogs to ensure transactional consistency and integrity?

<p>By scanning system catalogs, the system can identify which operations were in progress and take corrective actions to avoid data corruption (C)</p> Signup and view all the answers

Which of the following is not typically stored in a system catalog?

<p>The average value for an attribute. (B)</p> Signup and view all the answers

What does sequential performance in SSD relies almost entirely on?

<p>Large file transfers and video tasks, editing and rendering. (B)</p> Signup and view all the answers

What causes garbage in not writing in-place operations?

<p>Old blocks becoming garbage after updates. (A)</p> Signup and view all the answers

Which of the following storage types loses its data when power is removed?

<p>Dynamic Random Access Memory (DRAM) (A)</p> Signup and view all the answers

Which of the following best describes the data access speed of volatile storage compared to non-volatile storage?

<p>Volatile storage is generally faster than non-volatile storage. (A)</p> Signup and view all the answers

Which of the following is the most accurate consideration when designing a database system about cost and capacity?

<p>Higher capacity storage solutions generally have a lower cost per unit of storage, but slower access times. (B)</p> Signup and view all the answers

When the location of the requested data on a spinning disk is known, but the disk head is not positioned over the correct sector, which factor most influences the data access time?

<p>Rotational delay (A)</p> Signup and view all the answers

What is the primary role of 'wear leveling' in Solid State Drives (SSDs)?

<p>To evenly distribute write and erase cycles across all memory cells, extending the lifespan of the SSD. (C)</p> Signup and view all the answers

Which of the following occurs during 'garbage collection' in SSDs?

<p>Valid data from partially filled blocks is moved to new blocks so that the old blocks can be erased. (A)</p> Signup and view all the answers

In the context of data retrieval from a spinning disk, what is represented by 'seek time'?

<p>The time it takes to position the disk head over the correct track (cylinder). (A)</p> Signup and view all the answers

What is a notable advantage of storing data in column stores compared to row stores?

<p>Reduced storage space for certain types of analytical queries (OLAP). (B)</p> Signup and view all the answers

In database systems, what information does the 'dirty bit' associated with a page in the buffer indicate?

<p>Whether the page has been modified since it was loaded from disk. (D)</p> Signup and view all the answers

Which memory technology exhibits asymmetric read/write speeds, with reads generally being faster than writes?

<p>Solid State Drive (SSD) (A)</p> Signup and view all the answers

Which statement is most accurate regarding the characteristics of SSDs?

<p>SSDs have a limitation regarding the number of write/erase cycles a block can endure before wearing out. (C)</p> Signup and view all the answers

What is the primary reason for employing a Flash Translation Layer (FTL) in Solid State Drives (SSDs)?

<p>To manage the mapping between logical page addresses and physical storage locations, enabling wear leveling and garbage collection. (B)</p> Signup and view all the answers

What is the most significant reason memory access patterns in a DBMS are typically more predictable than those in a general-purpose operating system?

<p>DBMS operations often involve well-defined algorithms with known data access patterns such as the Relational Algebra. (B)</p> Signup and view all the answers

In the context of handling variable-length records within a database page, what is the purpose of storing an offset array in the header for each tuple?

<p>To facilitate direct access to individual attributes within the tuple without scanning from the beginning. (A)</p> Signup and view all the answers

Which of the following is the primary performance benefit of using a two-level hierarchy (directory of pages) for accessing data in a large table?

<p>It reduces the number of disk I/O operations to access a specific page. (B)</p> Signup and view all the answers

What is the function of 'pinning' a page in the buffer pool?

<p>It prevents the page from being replaced in the buffer by another page. (A)</p> Signup and view all the answers

What is a significant implication of the 'erase-before-write' characteristic of SSDs for database management systems (DBMS)?

<p>It necessitates the use of a Flash Translation Layer (FTL) to manage writes and erases, impacting performance. (A)</p> Signup and view all the answers

When designing a database system, what is the main trade-off to consider when choosing between fixed-length versus variable-length records?

<p>Fixed-length records simplify storage management but may waste space, while variable-length records are more space-efficient but add complexity. (B)</p> Signup and view all the answers

What is the greatest challenge in maintaining a sorted table if a new tuple must be inserted into the middle of a data page while preserving physical order?

<p>The potential for high overhead due to shifting existing tuples within the page to create space. (A)</p> Signup and view all the answers

How do system catalogs enable a database system to guarantee transactional consistency and integrity after a crash?

<p>By defining the structure and constraints of the database so the system can verify and enforce integrity during recovery. (C)</p> Signup and view all the answers

In the context of SSDs, what is the 'Maximum Terabytes Written' (TBW) rating?

<p>The total amount of data that can be reliably written to the SSD over its lifespan before it may become unusable. (B)</p> Signup and view all the answers

Which factor most significantly contributes to the effect of communication overhead in Network Attached DRAM?

<p>All of the above (D)</p> Signup and view all the answers

An engineer is designing a system to store frequently accessed configuration files given the need for rapid retrieval; which storage option would be most appropriate?

<p>Cache Memory, as it offers faster access speeds than main memory. (A)</p> Signup and view all the answers

In terms of data placement, what is true if multiple databases concurrently access different files on a disk?

<p>Optimal data placement for one file may not benefit the system due to interleaved requests. (C)</p> Signup and view all the answers

Which functionality of a Flash Translation Layer (FTL) contributes most directly to extending the lifespan of an SSD?

<p>Wear leveling to distribute write/erase cycles. (C)</p> Signup and view all the answers

Why is a high number of write operations, more than are needed, negatively affect the lifespan and performance of SSDs?

<p>It accelerates the degradation of flash memory cells, leading to earlier wear-out and reduced performance. (D)</p> Signup and view all the answers

When retrieving data form memory, describe what is meant by byte-addressability with regard to storage technologies.

<p>Each individual byte in memory has a unique address and can be accessed independently. (A)</p> Signup and view all the answers

How does FTL address the blocks in SSDs once a block is recognized to be faulty?

<p>The valid data is copied to a functional block, mark the previous block defective, and never use it again. (B)</p> Signup and view all the answers

What metadata is stored in system catalogs about tables?

<p>Directory for data-pages composing that table (Location of the table). (C)</p> Signup and view all the answers

How does the most recently used (MRU) policies attract for DBMS implementations?

<p>The page is likely not to be visited soon when it has already been processed. (B)</p> Signup and view all the answers

When are column stores more suitable than row stores? What kind of analysis is best?

<p>When individual attributes are being read. (D)</p> Signup and view all the answers

How should the free space be handled if there are resulting deletions of tuples in the memory space?

<p>Maintain directory-level priority queue for all the free space. (C)</p> Signup and view all the answers

In SSDs, there are parallel executions based on different tiers. What are multiple packages per channel, multiple dies per package, and multiple planes per die known as?

<p>Package-level parallelism (D)</p> Signup and view all the answers

Considering the architecture of modern SSDs, if a database system executes a query that requires accessing a significant number of pages scattered across different channels and dies, which form of parallelism would provide the most substantial performance benefits in reducing latency?

<p>Channel-level parallelism, because the SSD is more responsive in interleaving. (A)</p> Signup and view all the answers

A database system must implement a table where the order of tuples should be based on insertion times. What kind of table order should the system use?

<p>Heap table order (D)</p> Signup and view all the answers

You are designing a high-performance database system which frequently does random writes to the database. What is a critical strategy that you should use when selecting SSDs?

<p>Select SSDs with a fast interface such as NVMe, and that have more parallel Dies and Channels. (B)</p> Signup and view all the answers

Of the following options, which type of memory would likely have the lowest latency?

<p>Cache (C)</p> Signup and view all the answers

Describe the data storage attributes for Solid State Drives (SSDs).

<p>Page-Addressable Storage (A)</p> Signup and view all the answers

How are data transfer operations to SSDs performed?

<p>Both Page and Block Oriented (C)</p> Signup and view all the answers

How do solid-state drives improve read speed?

<p>It removes mechanical motion and is read directly. (D)</p> Signup and view all the answers

What is a general component of SSD controllers?

<p>Communicate between NAND Flash and input/output interfaces (A)</p> Signup and view all the answers

You want to build a file storage for archival. What type of disk type is best?

<p>HD (C)</p> Signup and view all the answers

How to FTL uses in-memory buffering?

<p>Delay page updates (C)</p> Signup and view all the answers

Assume that the page to exit the buffer as decided by the buffer replacement policy is Page q in buffer frame c, and suppose q can be discarded to free the buffer page as the data has not been changed since loaded. What is a common term that implies this?

<p>Clean page (D)</p> Signup and view all the answers

What is a fundamental advantage of storing data in a Solid State Drive (SSD) over a traditional Hard Disk Drive (HDD)?

<p>Lower power consumption. (C)</p> Signup and view all the answers

In the context of data storage, what primarily constitutes the seek time in a spinning hard disk?

<p>The time taken by the disk arm to move the head to the correct cylinder. (A)</p> Signup and view all the answers

What is the primary function of the Flash Translation Layer (FTL) when accessing SSDs?

<p>To provide a software driver that manages logical addresses and wear leveling. (D)</p> Signup and view all the answers

What is the main benefit of employing a two-level hierarchy for accessing data pages in a large table?

<p>Reducing the overhead of sequential access by providing a directory of pages. (C)</p> Signup and view all the answers

What determines the suitability of column stores over row stores for specific database workloads?

<p>The frequency of write operations. (B)</p> Signup and view all the answers

In the context of SSDs, what does 'wear leveling' primarily aim to achieve?

<p>Extending the lifespan of the SSD by distributing write and erase cycles evenly. (C)</p> Signup and view all the answers

If a tuple in a database is updated and now has a different size than before, what is the primary action that needs to occur in a data page using variable-length tuples?

<p>The offset value in the in-page directory entry for the tuple must be updated. (A)</p> Signup and view all the answers

In the context of buffer management, what does a high 'pin count' on a page indicate?

<p>The page is currently being used by multiple transactions and should not be removed from the buffer. (B)</p> Signup and view all the answers

When designing a database system, what is the main advantage of using fixed-length records over variable-length records?

<p>Simplification of storage management and quicker access. (B)</p> Signup and view all the answers

In the context of SSD technology and data management, what process is initiated by Flash Translation Layer (FTL) to reclaim space occupied by outdated or invalid pages?

<p>Garbage Collection (C)</p> Signup and view all the answers

What is a key difference between 2.5-inch SATA SSDs and M.2 NVMe SSDs in terms of performance?

<p>M.2 NVMe SSDs utilize PCI Express, offering significantly higher bandwidth compared to the SATA interface used by 2.5-inch SATA SSDs. (C)</p> Signup and view all the answers

Consider a relational database undergoing frequent updates and insertions. Which strategy balances the need for efficient space usage with maintaining the physical order of tuples?

<p>Implement overflow pages and link them to the original page to maintain sorted order during insertions. (A)</p> Signup and view all the answers

What is the primary reason for the recommendation against using Solid State Drives (SSDs) for long-term archival storage, especially for infrequently accessed data?

<p>SSDs are prone to data leaks over time, especially when powered off, making them less reliable for archival purposes than HDDs. (A)</p> Signup and view all the answers

In the context of database storage and retrieval, how does the implementation of clustered tables impact join operations involving two tables?

<p>Join operations are significantly sped-up as related tuples from both tables are stored on the same data pages, reducing I/O overhead. (C)</p> Signup and view all the answers

In the design of database systems, what would be a significant trade off when choosing between row stores vs column stores for a database?

<p>Column stores are best for queries using analytical queries over certain attributes where row stores do not. (D)</p> Signup and view all the answers

Why isn't attempt 1, keeping track of locations of empty slots, for fixed-length tuples good? Refer to the diagram on the uploaded document.

<p>The TIDs will keep changing, which is bad. (D)</p> Signup and view all the answers

In a database system employing heap table organization, how does the system typically manage the insertion of a new tuple when the available space in a given data page is insufficient?

<p>The system searches for a different data page which has sufficient space, potentially using a list of pages with some free space. (A)</p> Signup and view all the answers

Design Challenge: A database system needs to decide between adding metadata field with number of tuples in the system to know if the page is full or to add pointer to the next empty slot in the system. In terms of space of storage, which option performs best?

<p>Adding with the metadata to the number of tuples is better, as there is no space available. (A)</p> Signup and view all the answers

Suppose that a database system is designed to have a table where deletion entries occur very often. Describe what is meant by handling free space and what approaches would perform well given such a pattern.

<p>Heap Approach (maintaining doubly-linked lists of pages). (A)</p> Signup and view all the answers

A database system uses storage disks that are partially failing. You are tasked to re-implement FTL's handling of defective blocks. Describe what measures you need to consider upon re-implementing FTL.

<p>The valid data is copied and marked, then will not be used again. (A)</p> Signup and view all the answers

Flashcards

Volatile Storage

Data is lost when power is not connected.

Non-volatile Storage

Data is retained even when power is off.

Page/block-addressable storage

HDDs, SSDs use it. Unit of retrieval is a page, which is typically a block of 1,000s or millions of bytes

Seek Time

The time it takes for the disk to position the read/write head over the correct track or cylinder.

Signup and view all the flashcards

Rotational Delay

The time it takes for the desired sector to rotate under the read/write head.

Signup and view all the flashcards

Flash Translation Layer (FTL)

A software driver in the SSD controller.

Signup and view all the flashcards

Wear leveling

To evenly distribute the wear out and extends the lifespan of the device.

Signup and view all the flashcards

Main functions of FTL

Address mapping, wear leveling, garbage collection and bad block management.

Signup and view all the flashcards

FTL Out-of-Place Update

Flash cells cannot be updated in-place, old page is marked invalid, updated data written to new location

Signup and view all the flashcards

FTL in-memory buffering

Used to buffer the page updates, results in a reduction of block erasures.

Signup and view all the flashcards

FTL Garbage Collection

Out-of-place updates lead to outdated flash pages. The Garbage collector recycles these pages.

Signup and view all the flashcards

FTL's Wear Leveling Mechanism

The wear leveling mechanism avoids fast or premature aging of SSD.

Signup and view all the flashcards

FTL Handling of Defective Blocks

The action taken when a block is recognized as faulty. It's valid data is copied to another functional block.

Signup and view all the flashcards

FTL's Parallel Execution

SSD is composed of multiple channels, packages per channel, dies per package, and planes per die. Interleaves I/Os in the multiple channels.

Signup and view all the flashcards

Maximum Terabytes Written (TBW)

The total amount of data that can be stored and erased from the device before it reaches an unusable state.

Signup and view all the flashcards

HD Technology

Spinning magnetic platters.

Signup and view all the flashcards

SSD Technology

Flash memory.

Signup and view all the flashcards

Two-level Hierarchy

To reduce the sequential access overhead for large tables.

Signup and view all the flashcards

Row Store

Tuples are stored one tuple at a time (row). Databases are mainly stored in row-major order and each row adheres to the same schema.

Signup and view all the flashcards

Column Store

Data is stored in column major order.

Signup and view all the flashcards

Row Store (+ve)

It is easy to add or delete a record.

Signup and view all the flashcards

Column Store (-ve)

To insert a new tuple, will need to touch many locations.

Signup and view all the flashcards

Column Store (+ve)

Data from the same column can be compressed.

Signup and view all the flashcards

Row Store

Row-based storage, tuple attributes are stored together in same block which suitable for transactional processing on a specific tuple(OLTP) example like; Increase Mary's salary by 5%.

Signup and view all the flashcards

Column Store

Column-based storage, Columns of attributes are stored together in same block which Suitable for analytical processing on certain columns (OLAP) example like; E.g., find average salary

Signup and view all the flashcards

Fixed-length Records

Tuples in the table are all of the same length.

Signup and view all the flashcards

Tuple Identifier

Identifier Used to physically locate a tuple.

Signup and view all the flashcards

Important property of a Tuple Identifier

What uniquely identifies a tuple.

Signup and view all the flashcards

Variable-length Records

Each tuple in the table can be of a different length than the other tuples.

Signup and view all the flashcards

Data Page Layout containing Fixed-length Tuples

Head of linked list/Head of Free Space and need to search for next empty slot.

Signup and view all the flashcards

Data Page Layout Containing Variable-length Tuples :Slot-id

An index to an in-page directory entry that contains offset of the tuple from the beginning of the page.

Signup and view all the flashcards

How to know data exist in a page for Variable length tuple to insert

To know if there is a space in this page to insert a new tuple, Use Free Space Pointer and Free Space Size andNeed to update Free Space Size with every insert/update.

Signup and view all the flashcards

Heap table

Linked list of pages without free space and Linked list of pages with some free space.

Signup and view all the flashcards

How to Store Attribute Values Inside a Tuple? Tuples with Fixed-length Attributes

Schema has all attribute types that are of fixed length and Pre-calculate the start address (offset) of each attribute relative to the begin of each tuple

Signup and view all the flashcards

How to Store Attribute Values Inside a Tuple? Tuples with Variable-length Attributes

The data has attribute types where some attributes are of variable lengths and need to avoid scanning the tuple to access an attribute.

Signup and view all the flashcards

Sorted Table Order

Data is physically sorted

Signup and view all the flashcards

Hash function takes

Hash function takes the key of the tuple as input and produces the bucket (data page) that contains the tuple.

Signup and view all the flashcards

Cluster Tables

Sometimes termed Clustered Tables Combine tuples from multiple tuples into the same data pages

Signup and view all the flashcards

System Catalogs

Stores meta-data about the tables in the database and are tables (relations) themselves.

Signup and view all the flashcards

Why DBMS different in buffer than OS

A DBMS different from the OS's buffer management because it mostly accesses through the Relational Algebra Operators like; select, join, project.

Signup and view all the flashcards

Buffer memory

Data must be loaded from the data store to memory for queries to operate on it and The memory buffer pool is where these pages are stored.

Signup and view all the flashcards

Buffer Replacement Policies MRU

Most Recently Used Page or MRU. If the page was just processed

Signup and view all the flashcards

Data Structure: PinCount

Indicates number of transactions that requested a page is pinned, it can not be removed (PinCount =0 =candidate for replacement).

Signup and view all the flashcards

Data Structure: DirtyBit

If a DirtyBit the page is updated (not like in the data store) Dirty pages must be written again.

Signup and view all the flashcards

Data Structure: Clean Page

If clean Page the copy of that page similar to the data store and Clean pages do not need to be written

Signup and view all the flashcards

Study Notes

Storage, Data Organization, and Buffering: Storage Technologies

  • Key areas include storage technologies and memory hierarchy, non-volatile storage, page data access, row and column stores and buffer management.

Volatile vs. Non-Volatile Storage Media

  • Volatile storage loses data when power is disconnected, while non-volatile retains data.
  • Volatile storage uses byte-addressable access with retrieval of one byte or cache lines. Non-volatile storage uses page/block-addressable access. NVM is byte addressable storage.
  • Volatile storage is faster with nano-second speeds, while non-volatile is slower at micro/milli-seconds.
  • Volatile is more expensive, but smaller in capacity versus non-volatile. Examples include CPU registers, cache memory, and DRAM. Non-volatile examples are spinning hard disks, SSDs, NVM and PM.
  • Volatile access is random and sequential, with the same read/write latency, while non-volatile Sequential access is typically faster.
  • NVM read latency is 4x faster than write. SSD read latency is 12x faster than write, and HD has symmetric latency.

Various Storage Technologies

  • Cache, DRAM, NVM, SSD, and HD.
  • Cache has the fastest read and write latency at 1-15 ns. HD is slowest with 1-10 ms.
  • CPU registers, L1, L2 and L3 also have different read latencies. L1 cache is 1ns, L2 is 5ns and L3 is 15ns.
  • Cache and DRAM are volatile, while NVM, SSD and HD are non-volatile.
  • All are byte addressable except SSD and HD, which use Block/Page addressing.
  • Capacity increases from Cache (10's KB - 10's MB) to HD (TBs). DRAMS range from GBs-TBs and NVM/SSD support TBs.

Communication Overhead on Disk Storage

  • Hard disks and Network Attached HD.
  • Both use Milliseconds for access latency, but the Network Attached HD are 50s (Varies with size).
  • Both are non-volatile and use Block/Page addressing and supports TBs.

Communication Overhead on Memory Storage

  • DRAM and Network Attached DRAM uses nanoseconds and microseconds respectively. DRAM has 50ns and Network Attached DRAM is 10Ms.
  • Volatility and addressability are the same, but DRAM bandwidth caps out in the GBs-TBs range, while Network Attached DRAM is TBs.

Memory Hierarchy

  • Highest capacity is Archival Storage/tapes, then Network Storage, then Disk Storage and Solid-State Disks.
  • Main Memory, Cache Memory and CPU registers are placed at the bottom. Price and Speed also increase from the top to the bottom.
  • Archival, Network and Disk storage use page-oriented access while Solid-State Disks uses Non-Volatile access.
  • Main and Cache memory use mini-block access using cache lines, while CPU registers use byte-oriented access.

Non-Volatile Storage: Spinning Hard Disks

  • Spinning hard and solid-state drives use Disks and non-volatile storage. Volatile buffers are used, and access is page based.
  • Seek time is the time to move heads and is around 10 ms.
  • Rotational delay is the time to rotate around the spindle is around 10 ms.
  • Transmission delay is to transfer a block of data at around 1ms.
  • Each disk page has a disk block id.
  • Block Read(disk block id) transmits the corresponding disk page from the disk to the memory buffers.
  • Write(Block, disk block id) Transmits the data Block from the memory buffers into the disk page corresponding to the given disk block id.
  • Disk Block Identifier gets mapped into Cylinder (or Track) Number, Platter Number, Sector Number.
  • Retrieving disk page is dependent on on the current location of the disk head, and the location of the requested page. Involves mechanical movement and hence is relatively slow
  • Seek Time has the Head move to the corresponding cylinder number, wait until head is on top of the correct Sector Number in Rotational Delay. Once head is at the right cylinder, right sector, direct and give the signal to the head responsible or the block's Platter Number is Transmission Time
  • The Data Placement Problem involves placing data blocks on disk to minimize file read time using p platters, s sectors per track. One consideration is storing d₁ in Cylinder c₁, Platter p₁, Sector S₁
  • The ideal data write pattern is d2→ C1, P2, S1 (Seek Time = 0, Rotational Delay = 0)
  • Page prefetching retrieves pages before being requested, which may save time. It can be a wasted effort if the page is not requested.
  • Good data placement will result in lower retrieval cost of a page.
  • Multi-tenant databases can make data placement work difficult, because requests for pages from multiple files will be interleaved.

Non-Volatile Storage: Flash Memory and Solid-State Drives

  • Examples of flash memory include ROM, PROM, EPROM, EEPROM and Flash Memory.
  • Flash Memory is 1999 (Toshiba): based on EEPROM technology, uses gates to store bits or bubble memory. EEPROM (late 1970s, early 1980's- Hughes and Intel): Electrically Erasable Programmable Read-only Memory and is User-modifiable ROM

Solid-State Drive (SSD)

  • NV storage that persists data on solid-state flash memory. Replace traditional hard disk drives (HDDs) in computers performing the same basic functions as a hard disk drives.
  • SSDs do not have mechanical moving parts like HDDs. They are faster than HDDs.
  • SSDs use NAND Flash memory to store data with non-volatile NAND transistors. They are arrayed and stacked on circuit boards called 3D NAND. Stacks provide higher storage capacities as memory cells are stacked.
  • A Single-level cell or SLC are the most expensive, but most durable. Adding levels of cells increase the storage capacity but reduces reliability.
  • SSD Controller communicate between NAND Flash and input/output interface.
  • Manages all the flash memory cells by telling them what memory to access or manipulate, and guarantees data distribution and garbage collection.

Solid-State Drive Controller Behavior

  • There is garbage generated due to not writing in-place when the old block becomes garbage and needs to be collected. The blocks read into memory and are updated.
  • 2.5-Inch SATA SSD (Internal SSD) uses Serial Advanced Technology for Old interface for SSDs andHDDs also use the SATA standard. These are compatible with HDDs of the same size.
  • M.2 SATA SSD (External SSD) is similar to the 2.5-Inch HDDs that also use the SATA standard and are built into USB Thumb drives.
  • M.2 NVMe SSD (mainly, internal) uses Non-Volatile Memory Express using PCI Express bandwidth (PCIe) and not SATA bandwidth. The PCle is a common high-speed connection interface on the motherboard which achieves huge performance improvement.

Solid-State Drive Sequential Performance

  • SATA SSDs provide up to 0.5 GB/s for sequential reads and writes and is the slowest of the three types
  • NVMe Gen3 SSDs range between 1.5GB/s - 3.5 GB/s.
  • With At this speed, CPU begin to bottleneck not the storage
  • NVMe Gen4 SSDs achieve 5.5GB/s - 7GB/s for sequential reads and writes.

Solid-State Drive Durability

  • The standard unit of measurement for the maximum number of reads and writes to non-contiguous storage locations or IOPS.
  • Data is stored in blocks and blocks are composed of pages. Pages or blocks must be erased first. This erasure is performed at the Block level rather than the Page level.
  • If a block is erased, we can write one page at a time as long as we are writing on a blank page that has not been written into before after the most recent block erase, meaning pages can be read independently.
  • Block erase time is slow and to improve erase time, when you update a page, the old page in a block is marked outdated in order for the updated page to be written into a new page out of place in another block.
  • Blocks are not durable because writting/erasing causes the SSD to wear. They get damaged over time after a certain number of erases or writes.
  • Most SSDs come with integrated "wear-leveling" technology, to Evenly distribute the wear out extends lifespan of device with Error Correction Code page.
  • In Solid-state drives, the Erasure is before each data write, and they are subject to wearing out because reads are faster than writes.

Solid-State Drive Access Types

  • Implementations can be either Raw Flash which directly handle the NAND flash drive using in-memory write buffers to postpone batch write operations with Apply logging techniques to decrease number of random writes.
  • Another is is Flash Translation Layer (FTL-based), a software driver in the SSD controller Enforces an out-of-place policy with use Logical address mapping that Addresses wear leveling, Reclaims free space with Managed bad blocks.

Solid-State Drive FTL Management

  • Main functions of FTL or address mapping, wear leveling, garbage collection and bad block management,.
  • FTL implements an out-of-place update policy because Flash cells cannot be updated in-place so old pages are marked as invalid while the updated version is written to a new location. There are three mapping methods of doing this. Page-level, block-level, and hybrid.
  • FTL uses in-memory buffering to buffer page updates reducing the number of page writes which reduces block erases. The in-memory approach could result in loss of data due to volatility.
  • Out-of-place updates also results in out-of-date/invalid flash pages which triggers Garbage Collection.
  • FTL runs asynchronous garbage collector to recycle out-of-date pages.
  • FTL's wear leveling avoids premature aging by distributing block erasures uniformly across all of SSD
  • the FTL handles defective blocks marking them as defective with the data copied to other blocks.
  • Flash supports multiple packages per channel, multiple dies per package, and multiple planes per die enabling Channel, Package, Die level and Plane Parallelism.

Maximum Terabytes Written (TBW)

  • As data is written into SSDs and read, its effectiveness steadily decreases with age, which is quantified as Maximum Terabytes Written (TBW) of an SSD.

Hard Drives vs Solid State Drives

  • HD uses spinning magnetic platters, while SSD uses flash memory.
  • HD capacities are 250GB to 20TB, vs. SSD's 128GB to 8TB & 30TB to 100TB SSDs.
  • Speed for HDs are way down at 500 MB/s, while SSDs are up to 7000 MB/s. SSD<25-33% HD in terms of power consumption.
  • HDs have better long term storage, with external HDs are more portable than internal HDs, while SSD are less reliable for the long term with external SSDs are more portable than internal SSDs.
  • HDds are cheaper, starting at 24 for 500GB, vs. 55 dollars for SSD, but SSD offer better perofmrance with fast data retrieval and constant use and not not ideal for long term storage.

Tables and Directories

  • Pages of a table can be accessed if No indexes are available or There is a linked list of disk pages.
  • A Directory table contains the Table name that corresponds with the Head of linked list of pages.
  • Each page has forward and backward pointers using the disk page ids, offering Sequential access
  • Maintaining a two-level hierarchy uses directory pages.

Row vs Column Stores Intro

  • Table data is stored one tuple at a time or row adheres to the same schema. Adds and rewrites are easy.
  • Relational databases are mainly stored in row-major storage, storing data in row order on the storage media, easy
  • Suited for mostly transactional Workloads (OLTP)
  • To compute select sum(salary) from Employee, will access all the data in the Employee table but will access only the salary attribute from each record and will Need to access more data than needed.
  • Column stores are in column major order with dat of a certain attribute like salary stored together. They can be compressed and are Suited for mostly reads analytics workloads(OLAP).
  • With Column stores, queries will only touch the relevant data like average sale, but when inserting a tuple, will need to touch many locations.

Row and Column Stores

  • Tuple attributes (rows) are stored together in same block.

  • Uses suitable in transactional processing specific to a tuple like increase Mary's salary by 5% .

  • Attributes of Mary's name and salary will be contiguous in storage.

  • Column stores can have columns of attributes are stored together in the same block, and suitable for analytical processing on certain columns can be quickly compute average salaries. All salary attributes of the entire table will be contiguous in storage (fast).

Tuples Types

  • Tuples can be of 2 types: fixed or variable length for all tuples in a table.

Tuple Identifiers (TID)

  • How to Refer to a Tuple in a Table as Tuple Identifier or a Record Identifier.
  • Used to physically locate tuple Data page id (of the page that contains the tuple), or the index or offset.
  • Used in Foreign-key/ primary-key to avoid performing joins.
  • TIDs are used in the leaf level of an index like B-tree.
  • It must uniquely identif and potentially not change if Tuple data is updated.

Data Page Layout

  • Fixed Length: There is keeping track of empty slots or using a linked list so it should not change TID and do compaction which change the TID. This may involved having pointers to previous/next page with a linear slot search.

  • Variable Length: There is directory of the pages, if tuple moves in the page, no change but in directory changes. But this will take space in memory.

Problems in Handling Tuple Space

  • For inserting a new tuple? Use Free Space Pointer and Free Space Size with constant Update of pointer with Update of Table.
  • Handling Gaps after tuples are deleted, requires gaps be inserted or re-relocation inside same directory.
  • What to do if there is a lot of data with many tuples. Can create two level entries for data and what happens if to sequential check entire database.
  • So what is better for Free Space? Keep directory for tracking.

Heap Data Structures

  • Maintain two doubly-linked lists of pages in the table. A separate List for without Data and with Data. Then if Tuple N Data is to insert search page of N.

Storing Attribute Values

  • Having a Store Tuple is related to have Fixed Length in which case attribute must be fixed. Then we need to locate the offset at all attribute like Name, Position and etc. Or if is page it is not a good plan.
  • In this case will have to store Data inside catalog and do some other functions and code to help. This will be stored into System Catalog.
  • Now a Variable we use offset array for length, if = 0 is null

Tables and Index

  • The system catalog maintains data about table such as the directory of the page as well as if index or unique.
  • What to do if Tuple in a sort? Physical in data changes of existing. And then to avoid change then insert with Linked List.
  • Table can be Entire is inside an index?
  • Hash take tuple to a location then is B data search for data there for keys. If page splits tuples do not change.
  • Tables with join are considered as a cluster.

Organizing, and buffering: System catalogs stores meta data about the tables in the database.

  • System catalogs stores meta-data about the tables in the database. For every table it stores the directory for data-pages composing, How it's stored and its Schema including fixed/variables, plus whether there are primary or unique keys. Some data will exist in the offset

System Catalog Tables themselves are stored in the System Catalogs.

  • Stored are Index type, table views, user view, encrypted passwords for rights the rights to which tables.

Attributes_Catalog_Table( ) with the relation_name, attribute_name, attribute_type, attribute_order_in_table.

Buffer Management

  • Buffer management in the DBMS different because access is pre predictive instead of LRU in OS.

  • Most accesses goes to relational operators.

  • Data must loaded into Data Store for memory queries to operate. Target to reduce between Data store memory(Buffer Manager) increasing buffer hist reducing buffer misses.

  • Dirty bit, copy in the page or that data store and Pincount of data.

Functions of the Buffer Manager are in the buffer not in the Provide a empty store. Is is needed not if the and that and has 2 page or more. Buffer Replacement can be set not. If or is dirty clean up clean.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Data and File Storage Options on Android
10 questions
Magnetic Storage Overview
30 questions

Magnetic Storage Overview

EndearingArithmetic avatar
EndearingArithmetic
Types of Computer Storage and Evolution
10 questions
Use Quizgecko on...
Browser
Browser