Computer Hardware: Data Storage and Memory Hierarchy
42 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is one fundamental feature of the hard drive technology established in 1956?

  • It utilizes multiple disks spinning together. (correct)
  • It was primarily designed for real-time data processing.
  • It incorporates optical storage methods.
  • It uses solid-state memory.
  • Which of the following best describes the memory hierarchy in data storage?

  • Registers are slower than main memory.
  • Main memory is faster but more volatile than hard disks. (correct)
  • Tape storage is the fastest among the storage options.
  • Hard disks have faster access times than CPU caches.
  • How do new DB technologies respond to the evolution of data storage?

  • They restrict themselves to traditional relational databases.
  • They support real-time data workflows and varying data types. (correct)
  • They emphasize the use of only in-memory databases.
  • They exclusively focus on storage capacity without performance considerations.
  • What role do CPU frequencies and memory access play in the context of hardware evolution?

    <p>CPU frequencies became much faster than memory access on early computers.</p> Signup and view all the answers

    What is one of the key challenges related to trust management as mentioned in the context of data storage?

    <p>It has become a significant issue mainly for transactions, rather than storage.</p> Signup and view all the answers

    Which architecture does NoSQL databases typically use?

    <p>Shared nothing massively parallel architecture.</p> Signup and view all the answers

    What is the primary function of ACID in distributed data processing?

    <p>To ensure reliable transactions in systems.</p> Signup and view all the answers

    Which of the following trends is emphasized in the evolution of data storage and processing?

    <p>Integration of diverse data types like text and video.</p> Signup and view all the answers

    Which storage technology is the cheapest per gigabyte?

    <p>Flash</p> Signup and view all the answers

    What has a lower latency than reading 1 MB sequentially from SSD?

    <p>SSD random read</p> Signup and view all the answers

    Which memory level has the highest access speed?

    <p>L1 cache</p> Signup and view all the answers

    What type of data access structure is a B+ tree classified as?

    <p>Logical index</p> Signup and view all the answers

    What is the approximate latency for sending a packet from California to the Netherlands?

    <p>150 ms</p> Signup and view all the answers

    Which memory component is volatile?

    <p>SRAM</p> Signup and view all the answers

    What is one of the primary challenges in hardware regarding CPU speed versus memory speed?

    <p>CPU has become significantly faster than memory.</p> Signup and view all the answers

    Which type of storage has the highest price per gigabyte?

    <p>DRAM</p> Signup and view all the answers

    What is a primary advantage of using LSM-trees over B-trees?

    <p>Faster sequential writes</p> Signup and view all the answers

    What is the latency for a mutex lock/unlock operation?

    <p>25 ns</p> Signup and view all the answers

    In the context of memory hierarchy, which memory type is used for faster access by the CPU?

    <p>SRAM</p> Signup and view all the answers

    What is the purpose of ROWID in a database management system?

    <p>It indicates the physical address of a record.</p> Signup and view all the answers

    In the context of LSM-trees, what happens when level Ci is full?

    <p>It merges with level Ci+1.</p> Signup and view all the answers

    How do database management systems (SGBDs) improve efficiency in loading pages?

    <p>By maximizing the number of records on each page.</p> Signup and view all the answers

    What characteristic of LSM-trees makes access to a value potentially slower?

    <p>The requirement to scan multiple levels</p> Signup and view all the answers

    What is a defining feature of the storage strategy used by LSM-trees?

    <p>Batch processing of delayed writes</p> Signup and view all the answers

    What is the size of an Oracle data block as indicated in the content?

    <p>8 KB</p> Signup and view all the answers

    Which component in PostgreSQL pages contains information about free space?

    <p>Page Header</p> Signup and view all the answers

    What does the recommendation of geometric progression for |Ci+1| imply in LSM-trees?

    <p>The structure relies on exponential growth to manage data.</p> Signup and view all the answers

    What aspect does the 'Item Ids' array in PostgreSQL pages represent?

    <p>Offset and length pairs of tuples.</p> Signup and view all the answers

    What is a disadvantage of using LSM-trees compared to B-trees?

    <p>Slower access to stored values</p> Signup and view all the answers

    What is the primary storage structure for the first level (C0) of an LSM-tree?

    <p>A hash table or a tree</p> Signup and view all the answers

    What are tuples in the context of a PostgreSQL data page?

    <p>Row entries for tables and indices.</p> Signup and view all the answers

    How does the log-structured storage approach benefit the writing process?

    <p>It allows updates to be processed in batches.</p> Signup and view all the answers

    In the context of physical storage, what is a key factor affecting the number of pages a system can load?

    <p>Size of the data block.</p> Signup and view all the answers

    What does the term 'MVCC' refer to in database architectures?

    <p>Multi-Version Concurrency Control</p> Signup and view all the answers

    What information does the Special space in a PostgreSQL data page provide?

    <p>It can contain unused space in ordinary tables.</p> Signup and view all the answers

    Which method of data writing is characterized by writing in real-time?

    <p>Sequential writing</p> Signup and view all the answers

    What is a limitation of update-in-place storage in relational databases?

    <p>Fragmentation due to historical records</p> Signup and view all the answers

    What is the primary function of logs in a database system?

    <p>To track changes and ensure data recovery</p> Signup and view all the answers

    In what context is the term 'O2N' used?

    <p>Oldest-to-Newest data retrieval method</p> Signup and view all the answers

    What is the role of a buffer in database management?

    <p>To temporarily hold data for processing</p> Signup and view all the answers

    Which of the following distinguishes between sequential and random writing?

    <p>Order of data processing</p> Signup and view all the answers

    Which type of storage architecture primarily uses update-in-place methods?

    <p>Most relational databases</p> Signup and view all the answers

    Study Notes

    Hardware: Data Storage

    • The hard drive was initially created in 1956 by IBM.
    • It has multiple disks (platters) that spin rapidly together (thousands of revolutions per minute) and are ferromagnetic.
    • Each platter has multiple tracks and each platter has one head on an air cushion.
    • It costs roughly €100 and has a storage capacity of a few terabytes.
    • Since the 1960s, it has been the primary storage (secondary) for computers.
    • DBMSs were designed to be supported by this type of storage.

    Memory Hierarchy

    • Faster access is associated with higher levels of memory.
    • Storage devices, from fastest to slowest access time are: CPU registers, CPU caches, main memory, NVRAM, Flash, hard disk, tape.
    • CPU registers and CPU caches are volatile, meaning they lose their data when the power is turned off.
    • On early computers, the speed of the CPU was limited by the speed of the memory bus and memory access. However, CPU speed has increased much faster than memory speed.
    • SRAM is relatively expensive with a cost of 500$/GB, while DRAM is much cheaper at 5$/GB.
    • NVRAM and Flash are even cheaper, at 1$/GB and 0.4$/GB, respectively.

    Latency Numbers

    • In 2012, the latency for a L1 cache reference was 0.5 ns, while a branch mispredict took 5 ns.
    • L2 cache reference took 7 ns, while a mutex lock/unlock took 25 ns.
    • A main memory reference took 100 ns, while compressing 1 KB of data with Zippy took 3,000 ns (3 µs).
    • Sending 2 KB data over a 1 Gbps network took 20,000 ns (20 µs).
    • SSD random read took 150,000 ns (150 µs), while reading 1 MB sequentially from memory took 250,000 ns (250 µs).
    • Round trip within the same data center took 500,000 ns (0.5 ms), while sequentially reading 1 MB from an SSD took 1,000,000 ns (1 ms).
    • A disk seek took 10,000,000 ns (10 ms), while reading 1 MB sequentially from disk took 20,000,000 ns (20 ms).
    • Sending a packet from California to the Netherlands and back to California took 150,000,000 ns (150 ms).

    Latency Numbers (2019)

    • Latency for a 4 KB block in 2019: Fast NVMe (Optane) took 7 µs, Fast NVMe (Z-SSD) took 12 µs, round trip TCP packet on 10Gb Ethernet took 20-50 µs, and NVMe Flash SSD took 80 µs.

    Database Design Levels

    • There are three levels of database design: Conceptual, Logical, and Physical.
    • The conceptual design provides a user-oriented description of the database, independent of implementation, and can be represented using E-R diagrams or UML.
    • The logical design defines the structure of the database independently of the DBMS, using the relational model to define the schema of the tables.
    • The physical design specifies how the database is physically implemented, including materialized views, partitions, and indexes.

    Data Access

    • Physical and logical access paths are used to retrieve data from the database.
    • Physical access paths involve B+ tree, R-tree, k-d tree, bitmap, and Z-curve.
    • Logical access paths include materialized views and partitions.
    • Materialized views are pre-computed results of queries that are stored in the database, while partitions are divisions of data into smaller, more manageable units.

    Physical Storage in DBMS (Recap)

    • Each record has a physical address, known as the ROWID.
    • Records are grouped into pages, also called blocks.
    • Reading or writing data requires loading the corresponding page into memory.
    • Efficient DBMSs aim to minimize page loads.
    • Increasing block size allows access to more records within a single page but reduces the number of pages that can be loaded at once.

    Physical Storage: Oracle (1)

    • The physical storage in Oracle is based on the logical storage structures, which are defined by the E-R diagram.
    • Physical storage is implemented using blocks of data.

    Physical Storage: Oracle (2)

    • Oracle uses blocks of data that are 8 KB in size.
    • Each block contains a data block and system blocks.
    • Each data block contains multiple 1 KB blocks.
    • The data blocks are stored on disk, while system blocks are stored in memory.

    Physical Storage: PostgreSQL Blocks

    • PostgreSQL blocks are 8 KB in size.
    • Each block contains:
      • Page Header: Contains general information about the page and pointers to free space.
      • Item Ids: Array of pairs (offset, length), pointing to the actual items.
      • Free space: Unallocated space for new items.
      • Tuples: The actual items themselves (rows for tables, entries for indexes).
      • Special space: Empty in ordinary tables.

    Managing Writes in DBMS

    - Writes are managed by the DBMS using a buffer pool and a log.
    - When a record is updated, the new version is initially written to the buffer pool.
    - The log file records all changes, creating a journal of updates.
    - The updates are then transferred from the buffer pool to the database file.
    - Writes to the database or log can be synchronous (in real-time) or asynchronous (delayed).
    - The log file ensures data recovery even if the database is corrupted.
    

    Types of DB Architectures (MVCC)

    • MVCC stands for Multi-Version Concurrency Control.
    • Two common approaches are oldest-to-newest (O2N) and newest-to-oldest (N2O).
    • In O2N, older versions of data are stored at the beginning of the file, while newer versions are stored at the end.
    • In N2O, newer versions are stored at the beginning, and older versions are moved toward the end.
    • Both approaches have different advantages and disadvantages, depending on the specific workload and needs of the application.

    Types of Storage Architectures

    • Update-in-place storage is commonly used by relational databases, which update data directly in the main storage area. Updating might lead to fragmentation, requiring extra efforts for accessing data and managing the index.
    • Log-structured storage (LSM) focuses on fast writes by appending changes to a log file instead of updating the main storage directly. This approach benefits write performance but may slow down reads because the system needs to scan the log file for the latest data. This architecture is popularized in LSM-tree (Log-Structured Merge tree).
    • Delta with main store combines the best of both worlds. The main store is optimized for reads, while updates are made to a write-optimized buffer (similar to SAP Hana, Sans Souci, etc. ). The buffer is periodically merged with the main store, combining the benefits of update-in-place and log-structured storage. This approach allows for high write throughput and efficient read performance.

    LSM-Tree

    • The LSM-tree is a family of structures optimized for write throughput, characteristic of log-structured storage systems.
    • It consists of multiple levels (C0, C1, C2, ..., Ck) with C0 being an in-memory tree or hash table.
    • Subsequent levels (C1, C2, etc.) are B-trees stored on disk or slower memory.
    • New updates are initially written only to C0.
    • When a level (Ci) is full, it is merged with the next level (Ci+1), emptying Ci.
    • This means that every level (Ci) contains versions older than the versions in level (Ci + 1).
    • This results in up to 'r' versions of tuples coexisting.
    • A read operation traverses the levels from C0, C1... until the desired tuple is found.
    • The original implementation recommended using a geometric progression for level sizes (|Ci+1| = r * |Ci|), resulting in an overall number of unique keys (N) of O(r^k).
    • More recent solutions offer potentially more efficient strategies.

    LSM-Tree: Benefits and Disadvantages

    • By delaying writes to subsequent levels, the LSM-tree effectively accelerates write operations compared to traditional B-trees.
    • Advantages:
      • Faster sequential writes
      • Reduced fragmentation, leading to more efficient range queries
    • Disadvantages:
      • Retrieving a specific value might take longer.

    References

    • Database System Fundamentals (by Ramakrishnan & Gehrke)
    • Transaction Processing (by Jim Gray and Andreas Reuter)
    • Database Systems: A Practical Approach (by Thomas Connolly and Carolyn Begg)
    • Database Systems: Design, Implementation, and Management (by Carlos Coronel and Stephen Morris)
    • The Transaction Processing Performance Council (TPC)
    • The International Organization for Standardization (ISO)
    • The American National Standards Institute (ANSI)
    • The VLDB Endowment
    • The ACM SIGMOD

    In addition to the notes, I recommend reviewing the references listed in the text for a more comprehensive understanding of the topic.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    slides-stockage-physique-fr.pdf

    Description

    This quiz explores the fundamentals of data storage, specifically focusing on hard drives and their history, as well as the memory hierarchy in computing systems. You'll learn about different types of storage and the speed and efficiency of data access across various components. Test your knowledge of these critical computer hardware concepts.

    More Like This

    Data Storage Quiz
    5 questions

    Data Storage Quiz

    FreshestRabbit avatar
    FreshestRabbit
    Jerarquía de Memoria en Computación
    40 questions
    Data Storage Technology Quiz
    10 questions
    Компьютер жадысы
    5 questions
    Use Quizgecko on...
    Browser
    Browser