DBMS 10M PDF
Document Details
Uploaded by SeamlessStonehenge1724
Sri Venkateswara College of Engineering
Tags
Summary
This document provides an overview of database management systems, covering RAID (Redundant Arrays of Independent Disks) and various indexing and hashing techniques. It details how performance characteristics are improved by these methods.
Full Transcript
# Redundant Arrays of Independent Disks [RAID] Disk Organization techniques that manage a large number of disks, providing a view of a single disk of: - High Capacity & high speed by using disks in parallel. - High reliability by storing data redundantly, so data can be recovered if the disk fails....
# Redundant Arrays of Independent Disks [RAID] Disk Organization techniques that manage a large number of disks, providing a view of a single disk of: - High Capacity & high speed by using disks in parallel. - High reliability by storing data redundantly, so data can be recovered if the disk fails. ## Improvement of Reliability via Redundancy - Redundancy: Store extra information. - Examples: Mirroring, - duplicates every disk. - if one disk fails, its available in the other. - Mean time to data loss depends on mean time to failure and mean time to repair. ## Improvement of Performance via Parallelism - Two main goals: 1. Load balance multiple small accesses to increase throughput. 2. Parallelize large accesses to reduce response time. ## RAID Levels: ### RAID 0: Block Striping - non-redundant. - Unique data. - Used in high performance applications where data lost is not critical. ![Diagram of data in RAID 0. Letters represent disk addresses, all data is striped across the four disks]( ) ### RAID 1: Mirrored Disks - Copy all data into another for retrieving. - Offers best write performance. ![Diagram of data in RAID 1 Each of four disks contains the full set of data]( ) ### RAID 2: Error Detection & Correction - Hamming Code. - Error Correcting Codes [ECC] ### RAID 3: Bit-Interleaved Parity - Error Detection Parity → XOR gate. - Byte wise parity → Faster data transfer ![Diagram of data in RAID 3. Each disk is a block and contains bytes Ao - C0. The parity block contains the data: A(001), B(0), C(01)]( ) ### RAID 4: Block-Interleaved Parity ![Diagram of data in RAID 4. Each disk is a block and contains bytes A - C. The parity block contains the data: the parity of all the data on all disks.]( ) ### RAID 5: Block-Interleaved Distributed Parity - 1 parity for 1 block. ![Diagram of data in RAID 5. Each disk is a block and contains bytes A - C. The parity block contains the data: the parity of all the data on all disks.]( ) ### RAID 6: P+Q Redundancy ![Diagram of data in RAID 6. Each disk is a block and contains bytes A - C. The parity block contains the data: the parity of all the data on all disks. The Q block contains additional data, like the parity of the parity data.]( ) ## Hardware Issues - Software RAID: done entirely in software. - Hardware RAID: Implementations with special hardware. # Indexing and Hashing Essential techniques in DBMS for optimizing data retrieval and managing large datasets efficiently. ## Indexing - Improves speed of data retrieval operations on a database table. ### Types: - **Primary Index:** Created on the primary key field, that uniquely identifies the table in each record. - **Secondary Index:** Created on a non-primary key field to allow fast access for non-unique records. - **Clustered Index:** Sorts data rows in a table based on index values, affecting physical order. - **Non-Clustered Index:** Does not alter physical order but creates a separate structure. ### Working - When a query executes, the DBMS uses the index to find the data's location instead of scanning each record. - Example: B-tree index enables binary search, making retrieval much faster. ### Advantages: - Speeds up search queries for large datasets. - Reduces disk I/O minimizing data scans. ### Disadvantages: - Requires additional storage for the index - They must be maintained and updated after any operations leading to slow down of them. ## Hashing - Helps locate data quickly in DBMS by mapping data to a specific location, using a hash function. - Converts a given key (field value) into a fixed-size number (hash code) that indicates the bucket. ### Types - **Static Hashing:** Number of buckets (storage slots) is fixed, so here is an increase in data, multiplying records causing overflow. - **Dynamic Hashing:** Number of buckets can grow/shrink dynamically based on data, reducing overflow. ### Working - The hash function takes the search key and produces a hash code, directly mapping to a bucket making it faster than a linear search. ### Advantages - Fast data retrieval. - Efficient for operations requiring exact-match searches. ### Disadvantages: - Not ideal for range queries, requires an exact key match. - Hash function and overflow handling can increase complexity. ## Key Differences: | Feature | Indexing | Hashing | |---|---|---| | Data Structure | Tree based (B-tree) | Hash Table | | Access Type | Efficient for range and exact queries | Efficient for exact matches only | | Storage Efficiency | Requires additional Storage | Generally uses fixed storage | | Maintenance | Needs update with data changes | Can be complex, with overflow | # Deadlock A deadlock occurs when two or more transactions (Txs) are waiting indefinitely for each other to release locks on resources, causing a situation where none of the Txs can proceed. ## Causes of Deadlock - **Mutual Exclusion:** Each resource can be held by only one Tx at a time. - **Hold and Wait:** Txs holding resources can request additional resources. - **No Preemption:** A transaction holding a resource cannot be forced to release it. - **Circular Wait:** A circular chain of transactions exists, where each Tx holds a resource that the next one needs. ## Handling Deadlock 1. **Deadlock Prevention:** - **Wait-Die Schema:** Tx waits for a resource only if it is a higher priority than current; else it's aborted. - **Wound-Wait Schema:** A Tx with a higher priority can preempt a lower-priority holding the resource. 2. **Deadlock Detection:** Periodically checks for deadlock by building a wait-for graph, where Txs are represented as nodes. If the graph contains cycles, the deadlock is present. ## Deadlock Recovery - **Transaction Rollback:** The low-cost Tx is rolled back. - **Cascading Rollback:** One's rollback leads to other. - **Time Outs:** Automatically rolls back when wait time is exceeded. ## Example: The diagram below illustrates a deadlock situation. Please try to interpret it, using the text above. ![Diagram demonstrating the deadlock situation. The diagram shows four Txs (T1 - T4) a circular chain with four arrows representing that they are waiting for each other]( ) # B-Tree: B-Tree is a self-balancing tree data structure, commonly used for indexing. They are optimized for systems that read and write large blocks of data. ## Properties A B-tree of order m: - Every node contains most m-1 keys. - Every internal node (except root) has m children. - All leaves are at the same level. - Keys within each node are sorted. ## Operations in a B-Tree: - Search - Insert - Delete ## Advantages: - Balanced structure. - Reduced disk access. - Scalability. ## Example: **[10, 20, 30, 40, 50, 60, 70, 80, 90]** **Order 5:** 1. Insert upto 40: - **[10, 20, 30, 40]** 2. Add 50, leading to a split: - **[10, 20, 40, 50]** - 4 keys - 5 children node - **split at m/2 → 2** 3. Add 60, 70, 80: - **[10, 20, 40, 50, 60, 70, 80]** 4. Add 90: - **[10, 20, 40, 50, 70, 80, 90]** - **[10, 20, 40, 50, 60, 70, 80, 90]** - **[10, 20, 40, 50, 60, 70, 80, 90]** - **[10, 20, 40, 50, 60, 70, 80, 90]**