DBMS_FYIT_Unit 3_SEM 1.docx.pdf
Document Details
Tags
Full Transcript
F.Y. BSc. Information Technology DBMS Practical SYLLABUS SEMESTER -1 UNIT-1 Database System, Its Applications, Purpose of Database Systems, View of Data, Models, Database Languages, Database Users and Administrator, Database...
F.Y. BSc. Information Technology DBMS Practical SYLLABUS SEMESTER -1 UNIT-1 Database System, Its Applications, Purpose of Database Systems, View of Data, Models, Database Languages, Database Users and Administrator, Database Architecture, ER diagrams, EER Diagram, Introduction to the Relational Model, Database Schema, Keys, Relational Algebra – Relational Operations, Tuple Relational Calculus, Domain Relational Calculus, Normalization UNIT-2 SQL: Data definition, Data types, Types of Constraints, SELECT Operation, Where Search condition, Aggregate function, Null Values, Set Theory, Joined relations, Subqueries. Views: Introduction to views, Types of Views, Updates on views, comparison between tables and views. UNIT-3 Transaction Concept- Transaction State- Implementation of Atomicity and Durability – Concurrency control – Executions – Serializability- Recoverability – Implementation of Isolation – Testing for serializability- Lock –Based Protocols – Timestamp Based Protocols- Validation- Based Protocols – Multiple Granularity. Recovery and Atomicity – Log – Based Recovery – Recovery with Concurrent Transactions – Buffer Management – Failure with loss of nonvolatile storage-Advance Recovery systems- Remote Backup systems. 1 F.Y. BSc. Information Technology DBMS Practical UNIT-3 Transaction Concept- Transaction State- Implementation of Atomicity and Durability – Concurrency control – Executions – Serializability- Recoverability – Implementation of Isolation – Testing for serializability- Lock –Based Protocols – Timestamp Based Protocols- Validation- Based Protocols – Multiple Granularity. Recovery and Atomicity – Log – Based Recovery – Recovery with Concurrent Transactions – Buffer Management – Failure with loss of nonvolatile storage-Advance Recovery systems- Remote Backup systems. 3.1 Transaction Concept :- In database management systems (DBMS), a transaction is a fundamental concept used to ensure the integrity and consistency of data. A transaction is a sequence of one or more operations (like read and write operations) performed on a database that is treated as a single, indivisible unit. Here’s a breakdown of key aspects of transactions: Key Properties: ACID Transactions are characterized by the ACID properties, which stand for: 1. Atomicity: A transaction is an all-or-nothing proposition. This means that either all operations within the transaction are completed successfully, or none are. If any part of the transaction fails, the entire transaction is rolled back, ensuring that the database remains in a consistent state. 2. Consistency: A transaction must transition the database from one consistent state to another. This means that any transaction must adhere to the database’s rules, constraints, and integrity checks. 3. Isolation: Transactions should operate independently of one another. The intermediate state of a transaction should not be visible to other transactions. This ensures that transactions do not interfere with each other, even if they are executed concurrently. 4. Durability: Once a transaction has been committed, its changes are permanent and will survive any system failures. The database system must ensure that the effects of a committed transaction are preserved even in the event of a crash. Transaction Lifecycle 1. Start: The transaction begins with an initial operation, like a user initiating a change to the database. 2. Execution: The operations of the transaction are executed. These operations may include queries, updates, deletions, or inserts. 3. Commit: If all operations in the transaction are successful and the transaction meets all consistency rules, it is committed. This means that all changes made during the transaction are permanently applied to the database. 2 F.Y. BSc. Information Technology DBMS Practical 4. Rollback: If an error occurs or if the transaction cannot be completed as intended, the transaction is rolled back. This reverts the database to its state before the transaction began, thus ensuring that no partial or erroneous changes are applied. Example Scenario Consider a banking application where you transfer money from one account to another: 1. Begin Transaction: A transaction starts when you initiate a transfer. 2. Operations: ○ Deduct the amount from the sender’s account. ○ Add the amount to the recipient’s account. 3. Commit: If both operations are successful, the transaction is committed, making the transfer permanent. 4. Rollback: If an error occurs during the transaction (e.g., insufficient funds or a system failure), the transaction is rolled back, and no changes are made to either account. Concurrency Control When multiple transactions occur simultaneously, DBMSs use concurrency control mechanisms to ensure isolation. Common methods include: Locking: Preventing multiple transactions from accessing the same data simultaneously. Timestamp Ordering: Assigning timestamps to transactions to control their execution order. Optimistic Concurrency Control: Allowing transactions to execute without locking, and then checking for conflicts before committing. To illustrate the transaction states in a Database Management System (DBMS) along with their transitions, a state diagram can be quite helpful. Below is a textual description of the diagram, but you can visualize or draw it based on this description. Transaction State Diagram Here's a description of the state diagram for transaction states: 1. Active (or Running): The initial state when a transaction begins. The transaction is executing its operations. 2. Partially Committed: After executing all operations, the transaction moves to this state, waiting for the commit. 3. Committed: Once the transaction is successfully committed, this state indicates that all changes are permanently applied to the database. 3 F.Y. BSc. Information Technology DBMS Practical 4. Failed: If an error occurs during execution, the transaction moves to the failed state. 5. Aborted (or Rolled Back): If the transaction fails or is explicitly rolled back, it moves to this state, where all operations are undone. State Transitions 1. Active to Partially Committed: The transaction transitions to the partially committed state after executing all its operations. 2. Partially Committed to Committed: If no errors occur, the transaction transitions from partially committed to committed, making all changes permanent. 3. Active to Failed: If an error occurs during the execution, the transaction moves directly to the failed state. 4. Failed to Aborted: A transaction in the failed state is rolled back, moving to the aborted state. 5. Partially Committed to Aborted: If an issue occurs after partial execution but before commitment, the transaction moves from partially committed to aborted. Diagram Illustration Here is how you can visualize or draw the state diagram: [Active] | V [Partially Committed] --(Commit)--> [Committed] | ^ 4 F.Y. BSc. Information Technology DBMS Practical | | | | | | V | [Failed] ----(Rollback)-----> [Aborted] Explanation of Transitions: Active to Partially Committed: Occurs when the transaction finishes its operations but has not yet been committed. Partially Committed to Committed: Happens when the transaction is confirmed to be successful and all changes are applied. Active to Failed: Occurs when an error happens during transaction execution. Failed to Aborted: The transaction is rolled back to undo any changes. Partially Committed to Aborted: Happens if an error or issue is detected after partial completion but before final commitment. This diagram helps in understanding how a transaction moves through different states and how it deals with errors or successful completion. In the context of Database Management Systems (DBMS), serializability and recoverability are two crucial concepts related to transaction management and concurrency control. Serializability Serializability refers to the property of a schedule (or sequence of operations) that ensures the transactions executed concurrently are equivalent to some serial execution of those transactions. This means that even though transactions might be executed in parallel, the end result should be as if they had been executed one after the other in some order. Serializability ensures the consistency of the database. There are two primary types of serializability: 1. Conflict Serializability: A schedule is conflict-serializable if it can be transformed into a serial schedule by swapping non-conflicting operations. Two operations are considered conflicting if they belong to different transactions, access the same data item, and at least one of them is a write operation. 2. View Serializability: A schedule is view-serializable if it is view-equivalent to a serial schedule. View equivalence means that each transaction in the schedule reads the same values as it would in the serial schedule, and the final result of the transactions is the same as if they were executed serially. Recoverability 5 F.Y. BSc. Information Technology DBMS Practical Recoverability refers to the ability of a database system to ensure that, in the event of a transaction failure, the database can be restored to a consistent state. Recoverability is concerned with the property that a schedule should allow for the rollback or recovery of transactions without violating the consistency of the database. A schedule is considered recoverable if: 1. No Transaction Reads Uncommitted Data: A transaction should not read data written by a transaction that has not yet committed. This prevents issues where a transaction might base its operations on data that could be rolled back, leading to inconsistencies. 2. Commit Dependency: If a transaction TiT_iTireads data written by a transaction TjT_jTj, then TiT_iTishould only commit if TjT_jTjhas committed. This ensures that a transaction dependent on another does not commit if the other transaction has not been finalized. 3. Cascadeless Schedules: In a cascadeless schedule, transactions do not read data from uncommitted transactions, which simplifies recovery since transactions are not interdependent. While not all recoverable schedules are cascadeless, cascadeless schedules are preferred for easier recovery. Relationship between Serializability and Recoverability Serializability ensures that the concurrent execution of transactions preserves database consistency as if they were executed in some serial order. Recoverability ensures that if a transaction fails, the database can be restored to a consistent state without issues. While serializability and recoverability are related, they address different aspects of transaction management. Serializability is focused on the correctness of concurrent transaction execution, whereas recoverability is focused on maintaining database integrity in the face of transaction failures. Implementation of Isolation: Isolation is one of the key properties in transaction management, ensuring that transactions execute independently of one another. To implement isolation effectively, DBMSs use various techniques to manage concurrency. Here’s a brief overview of common approaches: 1. Two-Phase Locking (2PL): This protocol ensures isolation by using locks. It has two phases: ○ Growing Phase: Transactions acquire locks but cannot release them. ○ Shrinking Phase: Transactions release locks but cannot acquire any new ones. This helps prevent conflicts and ensures serializability. 2. Serializable Snapshot Isolation (SSI): This approach provides isolation by creating a snapshot of the database at the beginning of a transaction. Each 6 F.Y. BSc. Information Technology DBMS Practical transaction works with this snapshot and commits its changes without interfering with others. This technique helps prevent phenomena like dirty reads and non-repeatable reads. 3. Multi-Version Concurrency Control (MVCC): MVCC allows multiple versions of a data item to exist, providing a snapshot of the database for each transaction. This approach minimizes locking conflicts and allows concurrent transactions to work with different versions of the data. 4. Lock-Based Protocols: In addition to 2PL, other locking mechanisms like Shared and Exclusive locks can be used. Shared locks allow read access, while exclusive locks are used for writes, ensuring transactions don’t interfere with each other. 5. Optimistic Concurrency Control (OCC): This method allows transactions to execute without locks, assuming that conflicts are rare. Before committing, a transaction checks whether its changes conflict with other transactions. If conflicts are detected, it may roll back and retry. Each of these methods has its trade-offs in terms of performance, complexity, and the strictness of isolation levels they provide. Testing for serializability: Testing for serializability involves verifying whether a given schedule (or sequence of operations) of transactions is equivalent to some serial execution of those transactions. This is crucial for ensuring that concurrent transactions do not violate database consistency. There are several methods for testing serializability: 1. Conflict Serializable Schedules To test for conflict serializability, you can use the following approach: Construct the Precedence Graph: Also known as the Serialization Graph or the Conflict Graph. This graph represents transactions as nodes and conflicts as edges. ○ Nodes: Each transaction in the schedule. ○ Edges: Directed edges from transaction TiT_iTito TjT_jTjif TiT_iTi performs a write operation on a data item before TjT_jTjperforms a read or write operation on the same data item. Check for Cycles: A schedule is conflict-serializable if and only if its precedence graph is acyclic (i.e., contains no cycles). If the graph has no cycles, then the schedule is conflict-serializable and equivalent to some serial schedule. If there are cycles, the schedule is not conflict-serializable. Example: Suppose you have two transactions T1T1T1 and T2T2T2 with the following operations: T1T1T1 reads XXX, writes XXX T2T2T2 reads XXX, writes XXX 7 F.Y. BSc. Information Technology DBMS Practical Construct the precedence graph based on the operations: T1T1T1 → T2T2T2 if T1T1T1 writes before T2T2T2 reads or writes XXX T2T2T2 → T1T1T1 if T2T2T2 writes before T1T1T1 reads or writes XXX If the graph has cycles, then the schedule is not conflict-serializable. 2. View Serializable Schedules View serializability is a broader concept and can be more complex to test: Construct a View Graph: This involves ensuring that each transaction in the schedule reads the same values and produces the same final output as in a serial schedule. ○ Initial Reads: Each transaction reads the same initial values. ○ Final Writes: Each transaction writes the same final values as in some serial order. Compare with Serial Schedules: Determine if there exists a serial schedule that produces the same reads and writes as the given schedule. This is generally more complex and computationally intensive compared to conflict serializability. Example: For a schedule to be view-serializable, every transaction should see the same values in any serial order as they do in the given schedule. Constructing view serializability typically involves detailed analysis and comparisons, often relying on specific algorithms or heuristics. 3. Using Serialization Algorithms Algorithms for determining serializability are available and often used in database systems: Serializability Check Algorithm: Some algorithms can determine serializability by examining the schedule's execution and its potential serial order. Summary Conflict Serializability: Construct the precedence graph and check for cycles. View Serializability: Verify if the schedule can be viewed as equivalent to some serial execution, often requiring more in-depth analysis. Lock –Based Protocols Lock-based protocols are fundamental mechanisms used in Database Management Systems (DBMS) to control concurrent access to data and ensure the consistency and isolation of transactions. These protocols use locks to manage access to database resources and prevent issues such as lost updates, dirty reads, and uncommitted data. Here's an overview of the most common lock-based protocols: 8 F.Y. BSc. Information Technology DBMS Practical 1. Two-Phase Locking (2PL) Two-Phase Locking (2PL) is one of the most widely used locking protocols. It divides the transaction execution into two distinct phases: Growing Phase: The transaction can acquire locks but cannot release any locks. Shrinking Phase: The transaction can release locks but cannot acquire any new ones. Properties: Ensures conflict-serializability (i.e., any schedule produced by 2PL is conflict-serializable). Prevents cascading rollbacks because once a transaction starts releasing locks, it cannot affect other transactions. Drawbacks: Deadlocks: Occur when two or more transactions are waiting indefinitely for each other to release locks. Blocking: A transaction may be blocked waiting for a lock held by another transaction. 2. Strict Two-Phase Locking (Strict 2PL) Strict Two-Phase Locking is a more restrictive version of 2PL: Growing Phase: Similar to 2PL, where locks are acquired. Shrinking Phase: No new locks can be acquired after releasing any locks. Properties: Guarantees serializability but can lead to increased blocking and deadlocks compared to relaxed versions. 3. Conservative Locking Conservative Locking involves locking all the resources that a transaction might need before it begins executing. This can help avoid deadlocks because the system can more easily manage lock contention. Properties: Ensures that once a transaction starts, it has all the necessary locks, avoiding the need for additional locks later. Can lead to decreased concurrency and increased resource usage because of the conservative approach to locking. 9 F.Y. BSc. Information Technology DBMS Practical 4. Two-Phase Locking Variants Basic 2PL: As described, with a strict adherence to the growing and shrinking phases. Strict 2PL: A stricter form of 2PL that prevents deadlocks and ensures serializability but at the cost of potential increased blocking. Rigorous 2PL: Similar to strict 2PL but might have slight variations depending on specific implementations or additional constraints. 5. Lock Types Different types of locks are used to manage various levels of access: Shared Lock (S-Lock): Allows multiple transactions to read a data item but not modify it. Transactions holding a shared lock can only read the data item. Exclusive Lock (X-Lock): Allows a transaction to both read and modify a data item. When a transaction holds an exclusive lock, no other transactions can access the data item in any way. Intention Locks: Used in hierarchical locking schemes, such as the tree structure of locking tables, to signal a transaction's intention to acquire certain types of locks at lower levels. 6. Deadlock Handling Deadlock handling mechanisms are essential for managing situations where transactions are stuck in a cycle of waiting for each other’s locks: Deadlock Prevention: Avoid deadlocks by ensuring that transactions do not enter a state where deadlock can occur (e.g., using the wait-die or wound-wait schemes). Deadlock Detection: Periodically check for cycles in the wait-for graph to identify deadlocks and take corrective actions (e.g., rolling back transactions involved in the deadlock). Deadlock Recovery: Once a deadlock is detected, resolve it by aborting one or more of the transactions involved and rolling them back. Summary Lock-based protocols are crucial for ensuring consistency and isolation in concurrent transaction processing. Two-Phase Locking (2PL) and its variants are the most common approaches, providing a balance between ensuring serializability and managing lock contention. However, these protocols can also introduce challenges such as deadlocks and blocking, which require effective handling strategies. 10 F.Y. BSc. Information Technology DBMS Practical Timestamp Based Protocols Timestamp-based protocols are crucial in Database Management Systems (DBMS) for ensuring concurrency control and maintaining consistency when multiple transactions are executed simultaneously. These protocols help to manage how transactions are scheduled and how conflicts between transactions are resolved. Key Concepts of Timestamp-Based Protocols 1. Timestamp: A unique identifier, usually based on the time of arrival or creation of a transaction, that is used to determine the order of transactions. Each transaction receives a timestamp when it starts. 2. Transaction Ordering: Transactions are ordered based on their timestamps. This order helps in deciding which transaction should be given precedence in case of conflicts. 3. Conflict Resolution: Timestamp-based protocols determine the outcome of conflicting transactions based on their timestamps to maintain database consistency. Types of Timestamp-Based Protocols 1. Basic Timestamp Ordering Protocol ○ Timestamp Assignment: Each transaction is assigned a unique timestamp when it starts. ○ Read and Write Rules: Read Rule: A transaction Ti can read a data item x if the last write operation on x was done by a transaction Tj where the timestamp of Tj is less than that of Ti (i.e., Tj < Ti). Write Rule: A transaction Ti can write on a data item x if no other transaction has already read or written x since the last write operation, and Ti has the earliest timestamp among all transactions that have written or read x. 2. If a transaction violates these rules, it may be rolled back (i.e., aborted and restarted with a new timestamp). 3. Thomas's Write Rule ○ An enhancement to the Basic Timestamp Ordering Protocol. ○ Allows a write operation to be ignored if it is not needed, i.e., if a newer transaction has already performed a write that supersedes the current one. This can help to avoid unnecessary rollbacks. 4. Commitment Protocol ○ Ensures that once a transaction is committed, its changes are permanent and visible to other transactions. This is often handled in conjunction with timestamp-based protocols to ensure consistency. 11 F.Y. BSc. Information Technology DBMS Practical Advantages of Timestamp-Based Protocols Deadlock-Free: Since transactions are ordered by their timestamps, deadlocks are less likely to occur. Simple to Understand: The rules are straightforward and based on the concept of ordering transactions by timestamps. Prevention of Anomalies: Ensures serializability of transactions, meaning the outcome of executing transactions concurrently is the same as if they were executed serially. Disadvantages of Timestamp-Based Protocols Rollback Overhead: Transactions that violate timestamp rules may need to be rolled back and restarted, which can be costly in terms of performance. Complexity in Handling Large Transactions: Managing large transactions and their interactions can be complex, potentially leading to performance issues. Implementation in DBMS In practice, timestamp-based protocols are implemented in various database systems to manage transaction concurrency. They are often used in combination with other techniques, such as locking mechanisms, to balance the trade-offs between performance and consistency. Validation- Based Protocols Validation-based protocols, also known as validation-based concurrency control or validation-based concurrency control protocols, are used in Database Management Systems (DBMS) to manage concurrent transactions while ensuring consistency and isolation. Unlike locking protocols that prevent conflicts through locks, validation-based protocols allow transactions to execute and then validate their changes before committing. Key Concepts of Validation-Based Protocols 1. Transaction Phases: ○ Read Phase: Transactions read data items and make updates in a private workspace or local copy. During this phase, transactions do not affect the database directly. ○ Validation Phase: Before a transaction commits, it is validated to ensure that its changes do not violate the consistency of the database when compared to other transactions. ○ Write Phase: If the transaction passes validation, it writes its changes to the database; otherwise, it is aborted and must be rolled back. 12 F.Y. BSc. Information Technology DBMS Practical 2. Validation Rules: The main goal of validation is to ensure that a transaction can be committed without violating the serializability of the database. This is typically done by checking if the transaction conflicts with other committed transactions. Types of Validation-Based Protocols 1. Basic Validation-Based Protocol ○ Read Phase: Transactions execute and make changes in a local workspace. ○ Validation Phase: Each transaction is checked for potential conflicts with other transactions that have committed. This usually involves checking if the data read by the transaction was modified by other transactions that have committed since the transaction began. ○ Write Phase: If validation succeeds, the transaction’s changes are applied to the database; if validation fails, the transaction is aborted and restarted. 2. Certification-Based Protocol ○ Certification Phase: Transactions are validated against a certification database to ensure serializability. This phase checks whether the transaction’s actions would lead to a serializable schedule if committed. ○ Conflict Detection: Typically involves checking for write-read conflicts, read-write conflicts, and write-write conflicts between transactions. 3. View-Serializability Protocol ○ Ensures that transactions preserve view serializability, which is a stricter form of serializability. It involves checking that the result of transactions is equivalent to some serial order of transactions where the view of the database remains consistent. Advantages of Validation-Based Protocols Deadlock-Free: Unlike locking protocols, validation-based protocols do not involve locking resources, thus avoiding deadlock situations. Increased Concurrency: Transactions can execute concurrently without being blocked by each other, potentially improving throughput and system performance. Simpler Deadlock Resolution: Since there are no locks involved, issues related to deadlock detection and resolution are eliminated. Disadvantages of Validation-Based Protocols High Overhead in Validation: The validation phase can be computationally expensive, especially if the number of transactions or the complexity of the validation checks is high. Rollback Cost: If a transaction fails validation, it must be aborted and potentially restarted, which can be costly in terms of performance and resource utilization. Complexity in Implementation: Implementing effective validation strategies can be complex, especially in systems with high transaction volumes or complex dependencies. 13 F.Y. BSc. Information Technology DBMS Practical Implementation in DBMS Validation-based protocols are often used in environments where high concurrency is required and where the overhead of locking protocols is too high. They are commonly found in systems that prioritize performance and can afford the computational cost of validation. Examples include some distributed databases and certain high-performance databases where transactional throughput is critical. Overall, validation-based protocols provide an alternative to traditional locking mechanisms, offering a way to handle concurrency that can be beneficial in specific scenarios. However, the choice of concurrency control method depends on the particular needs of the DBMS and the workload characteristics. Multiple Granularity Multiple Granularity Locking is a sophisticated locking mechanism used in Database Management Systems (DBMS) to handle concurrency control more efficiently. It allows transactions to lock data at different levels of granularity, ranging from individual data items to entire databases. This approach aims to balance the trade-off between locking overhead and the level of concurrency. Key Concepts of Multiple Granularity Locking 1. Granularity Levels: Multiple granularity involves various levels of locking, which typically include: ○ Tuple Level: Locking individual rows in a table. ○ Page Level: Locking a page (a fixed-size block of storage) that contains multiple rows. ○ Table Level: Locking an entire table. ○ Database Level: Locking the entire database. 2. Locking Protocol: Different levels of granularity can be locked independently. This flexibility helps in optimizing the performance of the DBMS by allowing finer control over data access. For example, a transaction might lock at the page level if it needs to access multiple rows, reducing the need to lock each row individually. 3. Lock Modes: Multiple granularity locking typically uses various lock modes such as: ○ Shared Lock (S): Allows read access but not write access. Multiple transactions can hold shared locks on the same data item. ○ Exclusive Lock (X): Allows both read and write access. Only one transaction can hold an exclusive lock on a data item. ○ Intention Lock (IS, IX, SIX): Special types of locks used to indicate the intention to acquire locks at a finer granularity. Lock Modes and Their Purpose 14 F.Y. BSc. Information Technology DBMS Practical Intention Shared (IS): Indicates that a transaction intends to acquire shared locks on some lower-level granules (e.g., rows or pages) within a higher-level granule (e.g., table). Intention Exclusive (IX): Indicates that a transaction intends to acquire exclusive locks on some lower-level granules. Shared Intention Exclusive (SIX): Indicates that a transaction holds shared locks on a higher level but intends to acquire exclusive locks on some lower-level granules. Example of Multiple Granularity Locking Consider a database with the following hierarchy: Database (DB) ○ Table (T) Page (P) Tuple (R) A transaction might acquire locks as follows: 1. Database Level Lock: Locks the entire database if it needs to perform operations across multiple tables. 2. Table Level Lock: Locks an entire table if operations involve many rows or pages. 3. Page Level Lock: Locks a specific page if the operations are confined to a subset of rows. 4. Tuple Level Lock: Locks individual rows for precise operations. Advantages of Multiple Granularity Locking Increased Concurrency: By allowing different levels of granularity, multiple granularity locking reduces contention and increases concurrency. For instance, transactions can concurrently access different rows in the same page if only the page-level lock is held. Reduced Locking Overhead: Locking at a higher granularity level (e.g., page or table) reduces the number of locks required and thus decreases overhead compared to locking each individual row. Disadvantages of Multiple Granularity Locking Complexity: Implementing and managing multiple granularity locking is more complex compared to simpler locking mechanisms. Lock Compatibility: Ensuring compatibility between different lock modes and managing intention locks can be intricate, leading to potential implementation challenges. 15 F.Y. BSc. Information Technology DBMS Practical Potential for Deadlocks: As with other locking protocols, multiple granularity locking can still lead to deadlocks, though the structure of locks at various granularities can help mitigate some issues. Implementation in DBMS Multiple granularity locking is implemented in many modern DBMSs to handle concurrent access efficiently. It helps in optimizing performance by allowing transactions to operate at different levels of granularity based on their requirements. For example, large-scale databases or systems with high transaction volumes benefit from the reduced contention and increased concurrency provided by multiple granularity locking. In summary, multiple granularity locking is an advanced concurrency control technique that allows transactions to lock data at various levels of granularity, optimizing the balance between concurrency and locking overhead. It is a valuable tool for improving performance and managing large databases effectively. In Database Management Systems (DBMS), recovery and atomicity are crucial for ensuring data integrity and consistency in the face of system failures and concurrent transactions. Here’s a detailed overview of key concepts related to recovery, including log-based recovery, handling concurrent transactions, buffer management, and advanced recovery systems: 1. Recovery and Atomicity Atomicity ensures that a transaction is either fully completed or not executed at all, adhering to the principle that partial updates are not allowed. This guarantees that even if a transaction fails, the database remains in a consistent state. Recovery involves restoring the database to a consistent state after a failure. The recovery process uses logs and other mechanisms to undo or redo operations to ensure atomicity and durability. 2. Log-Based Recovery Log-based recovery is the most common approach for managing recovery in DBMSs. It relies on a log (or transaction log) to record all changes made to the database. Key components include: Write-Ahead Logging (WAL): Ensures that changes are first recorded in the log before being applied to the database. This guarantees that the log contains a record of all changes made, allowing for recovery if needed. Log Records: Each log record typically includes: ○ Transaction ID: Identifies the transaction. ○ Operation Type: Type of operation (e.g., update, insert, delete). ○ Data Item: The data item affected. 16 F.Y. BSc. Information Technology DBMS Practical ○ Old Value: The value before the operation. ○ New Value: The value after the operation. Recovery Techniques: ○ Redo: Reapplies changes from the log to ensure committed transactions are reflected in the database. ○ Undo: Rolls back changes made by transactions that were not committed. 3. Recovery with Concurrent Transactions Handling recovery in the context of concurrent transactions adds complexity. Several techniques and protocols help manage this: Checkpointing: Periodically saving the state of the database and the log to reduce recovery time. A checkpoint provides a consistent starting point for recovery. Log Sequence Numbers (LSNs): Used to track the position of log entries. They help in identifying which parts of the log need to be processed during recovery. Recovery Algorithms: Techniques like the ARIES (Algorithms for Recovery and Isolation Exploiting Semantics) protocol, which uses write-ahead logging, log sequence numbers, and checkpointing, are commonly employed for handling recovery with concurrent transactions. 4. Buffer Management Buffer management involves efficiently managing the database cache (buffer pool), where data pages are temporarily stored. Key concepts include: Buffer Pool: The area of memory where data pages are cached to reduce disk I/O operations. Page Replacement Policies: Strategies for deciding which pages to evict from the buffer pool when it is full. Common policies include: ○ Least Recently Used (LRU): Evicts the page that has been used least recently. ○ First In, First Out (FIFO): Evicts the oldest page in the buffer pool. Dirty Pages: Pages in the buffer pool that have been modified but not yet written to disk. Dirty page management ensures that all changes are eventually written to disk. 5. Failure with Loss of Nonvolatile Storage When there is a failure that results in the loss of nonvolatile storage (e.g., disk failure), recovery involves: Backup Systems: Regular backups (full, incremental, differential) are used to restore the database to a known good state. 17 F.Y. BSc. Information Technology DBMS Practical Remote Backup Systems: These systems replicate database backups to a remote location, providing additional protection against data loss due to local failures. 6. Advanced Recovery Systems Advanced recovery systems incorporate sophisticated techniques to enhance data recovery and integrity: Distributed Transactions: Handling transactions that span multiple databases or sites involves protocols like the Two-Phase Commit (2PC) to ensure atomicity and consistency across distributed systems. Point-in-Time Recovery: Allows restoration of the database to a specific point in time, useful for recovering from logical errors or data corruption. Shadow Paging: A technique where a shadow copy of the database is maintained. Changes are made to the shadow copy, and once the transaction is complete, the shadow copy becomes the current database. 7. Remote Backup Systems Remote backup systems involve maintaining backup copies of the database at geographically separated locations. This approach provides protection against catastrophic events (e.g., natural disasters) that could affect the primary site. Key aspects include: Data Replication: Techniques like synchronous or asynchronous replication to ensure that changes to the database are propagated to remote backup sites. Disaster Recovery Planning: Procedures and plans for quickly restoring database services from remote backups in the event of a failure. Summary Recovery and Atomicity ensure transactions are completed fully or not at all, preserving database integrity. Log-Based Recovery uses logs to track changes and facilitate recovery. Recovery with Concurrent Transactions involves techniques like checkpointing and sophisticated algorithms to manage multiple transactions. Buffer Management optimizes the use of memory for caching data. Failure with Loss of Nonvolatile Storage relies on backups and remote systems for recovery. Advanced Recovery Systems include distributed transactions and point-in-time recovery. Remote Backup Systems ensure data safety by maintaining backups at remote locations. 18 F.Y. BSc. Information Technology DBMS Practical These concepts together form a robust framework for maintaining the reliability and consistency of database systems in various failure scenarios and operational conditions. 19