DBMS 10M PDF

# Redundant Arrays of Independent Disks [RAID] Disk Organization techniques that manage a large number of disks, providing a view of a single disk of: - High Capacity & high speed by using disks in parallel. - High reliability by storing data redundantly, so data can be recovered if the disk fails. ## Improvement of Reliability via Redundancy - Redundancy: Store extra information. - Examples: Mirroring, - duplicates every disk. - if one disk fails, its available in the other. - Mean time to data loss depends on mean time to failure and mean time to repair. ## Improvement of Performance via Parallelism - Two main goals: 1. Load balance multiple small accesses to increase throughput. 2. Parallelize large accesses to reduce response time. ## RAID Levels: ### RAID 0: Block Striping - non-redundant. - Unique data. - Used in high performance applications where data lost is not critical. ![Diagram of data in RAID 0. Letters represent disk addresses, all data is striped across the four disks]( ) ### RAID 1: Mirrored Disks - Copy all data into another for retrieving. - Offers best write performance. ![Diagram of data in RAID 1 Each of four disks contains the full set of data]( ) ### RAID 2: Error Detection & Correction - Hamming Code. - Error Correcting Codes [ECC] ### RAID 3: Bit-Interleaved Parity - Error Detection Parity → XOR gate. - Byte wise parity → Faster data transfer ![Diagram of data in RAID 3. Each disk is a block and contains bytes Ao - C0. The parity block contains the data: A(001), B(0), C(01)]( ) ### RAID 4: Block-Interleaved Parity ![Diagram of data in RAID 4. Each disk is a block and contains bytes A - C. The parity block contains the data: the parity of all the data on all disks.]( ) ### RAID 5: Block-Interleaved Distributed Parity - 1 parity for 1 block. ![Diagram of data in RAID 5. Each disk is a block and contains bytes A - C. The parity block contains the data: the parity of all the data on all disks.]( ) ### RAID 6: P+Q Redundancy ![Diagram of data in RAID 6. Each disk is a block and contains bytes A - C. The parity block contains the data: the parity of all the data on all disks. The Q block contains additional data, like the parity of the parity data.]( ) ## Hardware Issues - Software RAID: done entirely in software. - Hardware RAID: Implementations with special hardware. # Indexing and Hashing Essential techniques in DBMS for optimizing data retrieval and managing large datasets efficiently. ## Indexing - Improves speed of data retrieval operations on a database table. ### Types: - **Primary Index:** Created on the primary key field, that uniquely identifies the table in each record. - **Secondary Index:** Created on a non-primary key field to allow fast access for non-unique records. - **Clustered Index:** Sorts data rows in a table based on index values, affecting physical order. - **Non-Clustered Index:** Does not alter physical order but creates a separate structure. ### Working - When a query executes, the DBMS uses the index to find the data's location instead of scanning each record. - Example: B-tree index enables binary search, making retrieval much faster. ### Advantages: - Speeds up search queries for large datasets. - Reduces disk I/O minimizing data scans. ### Disadvantages: - Requires additional storage for the index - They must be maintained and updated after any operations leading to slow down of them. ## Hashing - Helps locate data quickly in DBMS by mapping data to a specific location, using a hash function. - Converts a given key (field value) into a fixed-size number (hash code) that indicates the bucket. ### Types - **Static Hashing:** Number of buckets (storage slots) is fixed, so here is an increase in data, multiplying records causing overflow. - **Dynamic Hashing:** Number of buckets can grow/shrink dynamically based on data, reducing overflow. ### Working - The hash function takes the search key and produces a hash code, directly mapping to a bucket making it faster than a linear search. ### Advantages - Fast data retrieval. - Efficient for operations requiring exact-match searches. ### Disadvantages: - Not ideal for range queries, requires an exact key match. - Hash function and overflow handling can increase complexity. ## Key Differences: | Feature | Indexing | Hashing | |---|---|---| | Data Structure | Tree based (B-tree) | Hash Table | | Access Type | Efficient for range and exact queries | Efficient for exact matches only | | Storage Efficiency | Requires additional Storage | Generally uses fixed storage | | Maintenance | Needs update with data changes | Can be complex, with overflow | # Deadlock A deadlock occurs when two or more transactions (Txs) are waiting indefinitely for each other to release locks on resources, causing a situation where none of the Txs can proceed. ## Causes of Deadlock - **Mutual Exclusion:** Each resource can be held by only one Tx at a time. - **Hold and Wait:** Txs holding resources can request additional resources. - **No Preemption:** A transaction holding a resource cannot be forced to release it. - **Circular Wait:** A circular chain of transactions exists, where each Tx holds a resource that the next one needs. ## Handling Deadlock 1. **Deadlock Prevention:** - **Wait-Die Schema:** Tx waits for a resource only if it is a higher priority than current; else it's aborted. - **Wound-Wait Schema:** A Tx with a higher priority can preempt a lower-priority holding the resource. 2. **Deadlock Detection:** Periodically checks for deadlock by building a wait-for graph, where Txs are represented as nodes. If the graph contains cycles, the deadlock is present. ## Deadlock Recovery - **Transaction Rollback:** The low-cost Tx is rolled back. - **Cascading Rollback:** One's rollback leads to other. - **Time Outs:** Automatically rolls back when wait time is exceeded. ## Example: The diagram below illustrates a deadlock situation. Please try to interpret it, using the text above. ![Diagram demonstrating the deadlock situation. The diagram shows four Txs (T1 - T4) a circular chain with four arrows representing that they are waiting for each other]( ) # B-Tree: B-Tree is a self-balancing tree data structure, commonly used for indexing. They are optimized for systems that read and write large blocks of data. ## Properties A B-tree of order m: - Every node contains most m-1 keys. - Every internal node (except root) has m children. - All leaves are at the same level. - Keys within each node are sorted. ## Operations in a B-Tree: - Search - Insert - Delete ## Advantages: - Balanced structure. - Reduced disk access. - Scalability. ## Example: **[10, 20, 30, 40, 50, 60, 70, 80, 90]** **Order 5:** 1. Insert upto 40: - **[10, 20, 30, 40]** 2. Add 50, leading to a split: - **[10, 20, 40, 50]** - 4 keys - 5 children node - **split at m/2 → 2** 3. Add 60, 70, 80: - **[10, 20, 40, 50, 60, 70, 80]** 4. Add 90: - **[10, 20, 40, 50, 70, 80, 90]** - **[10, 20, 40, 50, 60, 70, 80, 90]** - **[10, 20, 40, 50, 60, 70, 80, 90]** - **[10, 20, 40, 50, 60, 70, 80, 90]**

Document Details

Tags

Related

Summary

Full Transcript