Transaction in DBMS PDF

TRANSACTION IN DBMS What is a Transaction in DBMS ? A Transaction is a logical unit of work. Itis the set of operations( basically read and write) to perform unit of work,( include small units of work). Itcan be viewed as a program whose execution preserves the consistency of the database. The primary goal of concurrency control and recovery scheme is to ensure that the execution of a transaction be atomic. What it means is, each transaction should access shared data without interfering with the other transactions. Transaction which successfully completes its execution is said to have been committed, otherwise the transaction is aborted or rollback. Transaction processing in a single and multi user environment Single user : DBMS is a single user at most one user using the system. Multiuser : DBMS is a multi-user if many users are using the system concurrently/simultaneously. For example, banking system or an Airline reservations Processing in multi user environment Multipleusers can access the database using the concept of multiprogramming, which allows OS to execute multiple program or processes at the same time. Ifa process waiting for IO transfer there is another program heading to utilize the CPU. So it is possible to share the time of the CPU for the several jobs. Theprocess which is suspended is resumed from the point where it was suspended. Hence, concurrent execution of processes is actually interleaved. Boundaries of a transaction Begin Transaction/ Start End Transaction / End We can specify explicitly the boundaries of a transaction by begin transaction and end transaction statements in an application program. A single application program may contain more than one transaction if it contains several transaction boundaries. Types of transaction : Readonly transaction : If the database operations in a transaction do not update the database but only retrieve data. Read write transaction : If the database operation in a transaction retrieves as well as update the database Why Recovery is Needed and The Operations of a Transaction Whenever a transaction is executing, either it must complete all the operations in the transaction and their changes must be permanently saved or the transaction does not have any effect on the database or any other transactions. So, for recovery purpose, the recovery manager of the DBMS needs to keep track of the following operations : Begin Transaction / Start : It marks the beginning of transaction execution. Read (A) : Reads a data item named A into a program variable. Write(A) : Updation of data item (A) into database. Commit : Transaction completed successfully. End Transaction / End : It marks the end of transaction execution. Rollback : It signals that the transaction has ended unsuccessfully, and so, any changes that the transaction may have applied to the database must be undone Steps involves in Execution of Transaction : Database is represented as a collection of named data items. The data item can be a single database record or multiple database record or a field in a database record. The size of the data item is called its granularity. States of a transaction A Transaction can be in one of the following states : Active: After the transaction has issued its start or say the transaction starts execution. Partially committed : When the last statement is reached i.e. when the transaction ends. Aborted or rollback : If it is found that normal execution can no longer be performed. Committed : After successful completion. Failed : If any failure occurs. Terminated : Either the transaction is successfully completed or it is aborted Problems of Transaction and How to Solve them ?? Transaction is associated with the following 3 problems : It may create an inconsistent results. It may create problems in concurrent execution. It may create an uncertainty to decide when to make changes permanent How to solve the above 3 problems ? To solve the above 3 problems, we have defined some properties for the transactions and they are called ACID properties. 1. A - Atomicity 2. C - Consistency 3. I - Isolation 4. D - Durability If any transaction satisfies these properties, then we can say it will be free from about problems ACID Properties Atomicity: Atomicity refers all or none.i.e., execute all operations of transaction or none of them. So, the transactions must be atomic. If the transaction fails due to any of the failure reason, i.e. for example, in transfer amount from account 1 to account 2, if amount is deducted from account 1 and failure occurs before the updation of account 2, then the transaction should be undo or a rollback is performed. i.e. transaction is kept back to previous state. This is done in order to avoid inconsistency. This roll back is performed by recovery management component. For rollback, we need to know the previous state(original value) and for this, the log file is created when the transaction begins. This file maintain the initial records until the transaction is committed and is stored in the disc. As soon as the transaction is committed or rolled back, log file is deleted. Rollback involves undo operation , not the redo operation. Consistency : Programmers responsibility to take care the logical failures, not the system failures. For example, the operations of the transactions must using Database from one consistent state to another consistent state such that it should be consistent before and after the transaction. The consistency cannot be maintained automatically all the time. For example, if the disc crushers, then the log file gets lost. In this case we have to manually maintain the consistency. To ensure data consistency we will use integrity constraints and they are depending on policies of organization. For example, the consistency criteria changes from scenario to scenario. If we have two accounts A and B, then the transaction operations involves transferring of balance between A and B. Then the consistency criteria is : Consistency is said to be failed if atomicity fails. Isolation Isolationis the property which guarantees that the execution of one transaction should not interfere with the execution of another transaction. To ensure the isolation, concurrency control protocols must be implemented. Durability : Oncea transaction commit, its updates survived, even if there is a subsequent system crash. Ifwe perform the commit statement, but during the transfer of records(updated) from main memory to secondary memory, failure occurs (such as power failure), and the user may feel that the transaction is committed successfully but actually it is not. In this case, the DBMS will redo the transaction. Sothe database should be able to recovery under any type of crash. Durability is hardware dependent. To achieve durability- RAID( Redundant Array of Independent Discs) component is used. Transaction management deals with Recovery( From Failure) Concurrency( From Concurrent Execution Serial and Concurrent Schedule What is a Schedule ? Definition 1: The order of the execution of the transactions is known as schedule. Definition 2 : The time order sequence of operations among transactions. Two types of schedules Serial schedule Non serial schedule or concurrent schedule. Serial Schedule Concurrent Schedule The transactions are completed one after Interleaved or simultaneous execution of two or another. For example, if there are 3 more transaction is called concurrent schedule. transactions T1, T2, T3, then, Firstly, T1 In Concurrent Schedule, CPU time is being complete its execution. Then after the commit shared by different transactions. of T1 transaction, transaction T2 completes, and in the last T3 completes. Advantage : Every schedule is a serial Advantage : 1. If T1 needs DMA processor, schedule, So no problems of Isolation occurs. CPU can execute T2. Hence Throughput increases. 2. Better Resource Utilization.3. Less Response Time. Drawback : 1. THROUGHPUT of Transaction is Drawback : Problems of Isolation must be less, since the CPU is idle if transaction needs managed. Questions on Serial Schedule and Concurrent Schedule : Concurrency Problems in Transaction Several problems can occur when concurrent transactions execute in an uncontrolled manner. Some Concurrency Problems in transaction are : The Lost Update Problem Theuncommitted dependency problem/The Temporary Update (or Dirty Read) Problem The inconsistent analysis problem/The Incorrect Summary Problem The Unrepeatable Read Problem The Lost update problem Transaction X retrieves some tuple a at time T1. Transaction Y retrieves same tuple a at time T2. Transaction X update the tuple a at time T3. Transaction Y update the same tuple at time T4 on the base of the values available at time T2. TransactionX's update is lost at time T4, since transaction Y overwrites it. The uncommitted dependency problem Transaction Y update the tuple at time T1. TransactionX retrieves the same tuple a and use the updated results at time T2. Update (made by transaction be at time T1) is then update at time T3. The inconsistent analysis problem In this example, transaction X is used to find the sum of account balances and transaction Y is used to transfer an amount 20 from account 2 to account 1. At time T1, transaction X retrieves AC 1 and sum becomes 35. Transaction Y retrives AC 2 at time T2. And update it at time T3 as a result balance of AC 2 becomes 40. Transaction Y retrives AC 1 at time T4 and update balance of AC 1 becomes 60. Transaction Y commits at time T6. At time T7, transaction X retrieve AC 2 and sum becomes 75 but not 100. The result produced by X is wrong. Hence we say that transaction X has seen an inconsistent state. The Unrepeatable Read Problem Another problem that may occur is called unrepeatable read, where a transaction T reads the same item twice and the item is changed by another transaction T between the two reads. Hence, T receives different values for its two reads of the same item. Another Example : Describing all the Concurrency Problems (Including Phantom Row and Unstable Errors) : Phantom Row and Unstable Errors : Concurrency Problems in Transaction Classification of schedule based on recovery The schedules can be classified based on recovery as : Irrecoverable schedule Recoverable schedule Cascading Schedule Cascadeless Schedule Strict Schedule : More restrictive than Cascadeless Irrecoverable schedule An uncommitted read only is known as dependency. Uncommitted read means that the value updated by one transaction is read by another transaction before its commit or rollback. Dependency occurs in a schedule in which a transaction reads a data item previously updated by another transaction, such that the read is done before commit of another one. i.e. in the case 1, T2 depends on T1 as the updated value of A is read by transaction T2 before any commit in transaction T1. Case 1 is an irrecoverable Schedule because the wrong value(i.e. updated value) of A is read by T2 and cannot be rollback due to commit statement. or Case 1 is an irrecoverable Schedule because T2 reads a updated value before the commit operation of T1. Case 2 and Case 3 are recoverable Schedules and no dependency occurs(WR) between T1 and T2 because the transaction is not reading any updated value.(It is updating always.) However, RW and WW dependency occurs. Case 4 is also a recoverable schedules because commit or rollback of transaction T2 (which is reading the updated value) is to be delayed until the transaction T1 is committed or aborted Question : What are the precautions for a schedule to be recoverable? or How to remove dependency if an uncommitted read occurs ? Solution : 1. Find out the dependent transaction by finding an uncommitted read. 2. Make sure the dependent transaction should be committed later Recoverable schedule A Schedule in which for every pair of transaction Ti and Tj such that Tj reads a data item previously written by Ti, the commmit operation of Tj should appear after commit or abort operation of Ti. Case 5 is an irrecoverable Schedule because T2 reads a updated value before the commit operation of T1. Case 6 is a recoverable schedule because the transaction is not reading any updated value. Case 7 is also a recoverable schedules because commit of transaction T2 (which is reading the updated value) is to be delayed until the transaction T1 is committed or aborted. Case 8 is a recoverable schedule because the commit of transaction T2, T3 and T4 (which are reading the updated value one by one) is to be delayed until the transaction T1 is aborted. Cascading Schedule or What is Cascading Rollback ?? The failure of single transaction leads to rollback of several transactions [i.e. rollback of dependent transaction]. It leads to wastage of CPU time. For example T2 depends on T1, T3 depends on T2, and T4 depends on T3. Hence transitively T4 on T1. If T1 fails, we have to rollback T1, then rollback T2, then T3 and finally T4. Such a rollback is known as cascading rollback ( failure of transaction causes rollback of other dependent transaction). The Schedule is irrecoverable T2, T3, T4 commit before T1. Cascadeless Schedule : Definition 1 : A Schedule that avoids cascading rollback, is called as cascadeless schedule. Definition2 : A Schedule in which for every pair of transaction Ti and Tj such that read data item previously written by Ti, the commit / abort operation of Ti should appear before read operation of Tj. Cascadeless schedule guarantees only committed read operations. Cascadeless rollback is not free from inconsistency. It may contain RW problem WW problem But the WR problem is not possible because we ensured committed read only Strict Schedule A Schedule, in which a transaction neither Relation between the Schedules : read or write a data item X until the last transaction that has written X is committed or aborted. Problems in Strict Schedule : Inconsistency is possible because of RW problem. No inconsistency exist because of WR, WW problems. Serializability in Database A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the 'n' transactions. Every serializable schedule is consistent i.e. it is not suffering from RW, WR, WW etc. The concept of serializability of schedules is used to identify which schedules are correct when transaction executions have interleaving of their operations in the schedules. Serializableschedules are always considered to be correct when concurrent transactions are executing. Difference between serial schedule and a serializable schedule Themain difference between the serial schedule and the serializable schedule is that in serial schedule, no concurrency is allowed whereas in serializable schedule, concurrency is allowed. In serial schedule, if there are two transaction executing at the same time and if no interleaving of operations is permitted, then there are only two possible outcomes : Execute all the operations of transaction T1 (in sequence) followed by all the operations of transaction T2 (in sequence). Execute all the operations of transaction T2 (in sequence) followed by all the operations of transaction T1 (in sequence). In Serializable Schedule, if there are two transaction executing at the same time and if interleaving of operations is allowed, there will be many possible orders in which the system can execute the individual operations of the transactions. Inserializable schedule, the concurrent execution of schedule should be equal to any serial schedule so that schedules are always considered to be correct, when transaction executions have interleaving of their operations in the schedules. Example of Serializable Schedule Let us consider 3 schedules S1, S2, and S3. We have to check whether they are serializable with S or not ? Example of Serial Schedule When are 2 schedules equivalent? There are three types of equivalence of schedules : Result equivalence Conflict equivalence View equivalence Basedon the types of equivalence, we define the types of serializability. There are accordingly three types of serializability which are: Results serializable Conflict serializable View serializable Result Equivalence and Result Serializable : Inresults equivalence, the end result of schedules heavily depend on input of schedules. The final values are calculated from both schedules (given and serial) and check whether they are equal. Result Serializable are not generally used because of lengthy process. How to check for Conflict Serializable Schedule ? Conflict Equivalence and Conflict Serializable Schedule Beforewe discuss conflict equivalence and conflict serializable schedule, you must know about conflicts. So, what is a conflict is described in the following section. What is a conflict? A pair of Operations in a schedule such that if their order is interchanged then the behavior of atleast one of the transactions may change. Operations are conflict, if they satisfy all three of the following conditions : They belong to different transactions They access the same data item At least one of the operation is a write operation. Conflict equivalence Schedules are conflict equivalent if they can be transformed one into other by a sequence of non conflicting interchanges adjacent actions. For example : Question 1 : Check whether the schedules S1 and S2 are conflict equivalent or not ? Question 2 : Check whether the schedules S1, S2 and S3 are conflict equivalent or not ? Check for S2 and S3 are conflict equivalent or not? Conflict Serializable Schedule A Schedule is conflict serializable if it is conflict equivalent to any of serial schedule. Testing for conflict serializability Method 1 : First write the given schedule in a linear way. Find the conflict pairs (RW, WR, WW) on same variable by different transactions. Whenever conflict pairs are find, write the dependency relation like Ti → Tj, if conflict pair is from Ti to Tj. For example, (W1(A), R2(A)) ⇒ T1 → T2 Check to see if there is a cycle formed, If yes= not conflict serializable No= we get a sequence and hence are conflict serializable. Questions on Conflict Serializable As Cycle formed, therefore Schedule is not a conflict serializable schedule. How to check for View Serializable Schedule ? View equivalent schedule and View Serializable Schedule View Equivalent Schedule : Consider two schedules S1 and S2, they are said to be view equivalent if following conditions are true : Initial read must be same. S1 : T1 reads A from Database. S2 : T1 reads A from T2. ∴ S1 ≠ S2. Thereare two transactions say Ti and Tj, The schedule S1 and S2 are view equivalent if in schedule S1, Ti reads A that has been updated by Tj, and in schedule S2, Ti must read A from Tj. i.e. write-read(WR) sequence must be same between S1 and S2. S1 : T3 reads value of A from T2. S2 : T3 reads value of A from T1. ∴ S1 ≠ S2. i.e. write- read sequence is not same between S1 and S2. Final write operations should be same between S1 and S2. S1 : A is finally updated by T3. S2 : A is finally updated by T1. ∴ S1 ≠ S2. View serializable schedule A Schedule is view serializable if it is view equivalent to any serial schedule. The following two examples will illustrate how to find view equivalent schedule A Shortcut for Finding View Serializable : Funda of Blind Write : What is a Blind Write ? Question 2 : Check whether the schedule is view serializable or not ? Questions on View Serializable Question 3 : Check whether the schedules is serializable or not ? Difference between conflict and view equivalent Conflict Equivalence and Conflict Serializable Schedule Conflict Equivalence : Schedules are conflict equivalent if they can be transformed one into other by a sequence of non conflicting interchanges adjacent actions. Conflict Serializable Schedule : A Schedule is conflict serializable if it is conflict equivalent to any of serial schedule. View Equivalent Schedule and View serializable schedule View Equivalent Schedule : Consider two schedules S1 and S2, they are said to be view equivalent if following conditions are true : Initial read must be same. There are two transactions say Ti and Tj, The schedule S1 and S2 are view equivalent if in schedule S1, Ti reads A that has been updated by Tj, and in schedule S2, Ti must read A from Tj. i.e. write-read(WR) sequence must be same between S1 and S2. Final write operations should be same between S1 and S2. View serializable schedule : A Schedule is view serializable if it is view equivalent to any serial schedule. The following two examples will illustrate how to find view equivalent schedule. Difference Between Conflict Equivalent Schedule and View Equivalent Schedule Conflict serializability is easy to achieve but view serializability is difficult to achieve Every conflict serializable is view serializable but the reverse is not true. Itis easy to test conflict serializability but expensive to test view serializability. Mostof the concurrency control schemes used in practice are based on conflict serializability. Concurrency Control Protocol Need of Concurrency Control Protocol Tomaintain the consistency of shared data access during concurrent execution. Attempting to have maximum possible concurrency without inconsistency. Mechanisms To Control the Concurrency: There are two mechanisms by which we can control concurrency according to our needs. Thefirst mechanism is that in which the user is responsible to write the consistent concurrent transactions and they are called Lock Based Protocols. And the second mechanism is that in which system itself tries to detect possible inconsistency during concurrent execution and either the inconsistency recovered or avoided. They are called Time Stamping Protocols. Classification of concurrency control protocol Lock based protocols Binary locks Shared / exclusive locks or read / write locks 2 phase locking protocol Basic 2pl Conservative 2pl or static 2pl Strict 2pl Rigorous 2 PL 7 Graph Based Protocol Timestamp based protocol Timestamp ordering protocol Thomas write rule Multiple granularity protocol Multi version protocols What is locking? The concurrency problems can be solved by means of concurrency control technique called locking. A LOCK variable ( equivalent to semaphores used in OS) is associated with each data item which is used to identify the status of the data item ( whether the data is in use or not). When a transaction T intends to access the data item, it must first examines the associated LOCK. If no other transaction holds the LOCK, the scheduler lock the data item for T. If another transaction T1 wants to access same data item then the transaction T1 has to wait until T releases the lock. Thus, at a time only one transaction can access the data item. Using locks, we ensure Serializability and Recoverability but sometimes they may lead to Deadlock. Deadlock occurs when each transaction T in set of two or more transactions is waiting for some item that is locked by some other transaction T' in the set. Transaction Responsibility : Legal Transaction To request lock before use of data item To release lock after use of data item. Timestamp ordering A different approach that guarantees serializability involves using transaction timestamps in order transaction execution for an equivalent serial schedule. Timestamp ordering is an alternative to locking in which the problem of concurrency control can be handled by assigning ordered timestamps to transactions. Transaction timestamp TS (T) is a unique identifier which tells the order in which the transaction are started. Every transaction is assigned a timestamp by Database Administrator (DBA) such that Timestamp is unique. Time stamps are in increasing order. Hence, if transaction T1 starts before transaction T2, then TS(T1) < TS(Tj). A timestamp based scheduler orders conflicting operations according to their timestamp values. Thus if Pi(X) and Qi(X) are two conflicting operations on data item X requested by transaction Ti and Tj, Pi(X) will be scheduled before Qi(X) if and only if TS(Ti) < TS(Tj). Binary Locks A binary lock can have 2 States or values Locked (or 1) and Unlocked (or 0) We represent the current state( or value) of the lock associated with data item X as LOCK(X). Operations used with Binary Locking lock_item: A transaction request access to an item by first issuing a lock_item(X) operation. If LOCK(X) =1 or L(X) : the transaction is forced to wait. If LOCK(X) = 0 or U(x) : it is set to 1( the transaction locks the item) and the transaction is a load to access item X. unlock_item: After using the data item the transaction issues an operation unlock(X), which sets the operation LOCK(X) to 0 i.e. unlocks the data item so that X may be accessed by another transactions. Transaction Rules for Binary Locks Every transaction must obey the following rules : A transaction T must issue the lock(X) operation before any read(X) or write(X) operations in T. A transaction T must issue the unlock(X) operation after all read(X) and write(X) operations in T. If a transaction T already holds the lock on item X, then T will not issue a lock(X) operation. If a transaction does not holds the lock on item X, then T will not issue an unlock(X) operation. Transaction Rules for Binary Locks Every transaction must obey the following rules : A transaction T must issue the lock(X) operation before any read(X) or write(X) operations in T. A transaction T must issue the unlock(X) operation after all read(X) and write(X) operations in T. If a transaction T already holds the lock on item X, then T will not issue a lock(X) operation. If a transaction does not holds the lock on item X, then T will not issue an unlock(X) operation. Points About Binary Locking : A binary lock enforces mutual exclusion on the data item. Serializability is not ensure i.e. may not eliminate non serializable schedule. The binary locks are simple but restrictive and so are not used in practice. Implementation of Binary Locks : Binary lock is implemented using 3 fields plus a queue for transactions data waiting to access item. The 3 fields are : 1. Data_item_name 2. LOCK 3. Locking_transaction To keep track of and control access to locks, DBMS has a lock manager subsystem. Items that are not in the lock table are considered to be unlocked. The system maintains only those records for the items that are currently lock in Shared and Exclusive Locks or Read/Write Locks Need of Shared/Read and Exclusive/Write Lock The binary lock is too restrictive for data items because at most one transaction can hold on a given item whether the transaction is reading or writing. To improve it we have shared and exclusive locks in which more than one transaction can access the same item for reading purposes.i.e. the read operations on the same item by different transactions are not conflicting. What are Shared and Exclusive Locks Exclusive(or Write) Locks and Shared(or Read) Locks. Shared Locks If a transaction Ti has locked the data item A in shared mode, then a request from another transaction Tj on A for : Write operation on A : Denied. Transaction Tj has to wait until transaction Ti unlock A. Read operation on A : Allowed. Exclusive Locks Exclusive Locks If a transaction Ti has locked a data item a in exclusive mode then request from some another transaction Tj for Read operation on A : Denied Write operation on A : Denied What does the Compatibility Table means ? Ifa transaction has lock a data item in shared mode, then another transaction can lock the same data item in shared mode. Ifa transaction has lock a data item in shared mode, then another transaction cannot lock the same data item in exclusive mode. Ifa transaction has lock a data item in exclusive mode, then another transaction cannot lock the same data item in shared mode as well as exclusive mode. Operations Used with Shared and Exclusive Locks 1. Read_lock(A) or s(A) 2. Write_lock(A) or X(A) 3. Unlock(X) or U(A) Implementation of Shared and Exclusive Locks Shared and exclusive locks are implemented using 4 fields : 1. Data_item_name 2. LOCK 3. Number of Records and 4. Locking_transaction(s) Again to save space, items that are not in the lock table are considered to be unlocked. The system maintains only those records for the items that are currently locked in the lock table. Value of LOCK(A) : Read Locked or Write Locked If LOCK(A) = write-locked - The value of locking transaction is a single transaction that holds the exclusive(write) Lock on A. If LOCK(A) = read-locked - The value of locking transaction is a list of one or more transactions that hold the Shared(read) on A. Transaction Rules for Shared and Exclusive Locks Every transaction must obey the following rules : A transaction T must issue the operation s(A) or read_lock(A) or x(A) or write_lock(A) before any read(A) operation is performed in T. A transaction T must issue the operation x(A) or write_lock(A) before any write(A) operation is performed in T.. After completion of all read(A) and write(A) operations in T, a transaction T must issue an unlock(A) operation. If a transaction already holds a read (shared) lock or a write (exclusive) lock on item A, then T will not issue an unlock(A) operation. A transaction that already holds a lock on item A, is allowed to convert the lock from one locked state to another under certain conditions. Upgrading the Lock by Issuing a write_lock(A) Operation or Conversion of read_lock() to write_lock() : Case 1 - When Conversion Not Possible : A transaction T will not issue a write_lock(A) operation if it already holds a read (shared) lock or write (exclusive) lock on item A. Case 2 - When Conversion Possible : If T is the only transaction holding a read lock on A at the time it issues the write_lock(A) operation, the lock can be upgraded; Downgrading the Lock by Issuing a read_lock(A) or Conversion of write_lock() to read_lock() : A transaction T downgrade from the write lock to a read lock by acquiring the write_lock(A) or x(A), then the read_lock(A) or s(A) and then releasing the write_lock(A) or x(A). Points About Shared and Exclusive Locking : Shared and Exclusive Lock results more concurrency than Binary Locking. Shared and Exclusive Lock may not ensure serializability i.e. Non Serializable Schedule is possible to execute using shared and exclusive locking. 2 Phase Locking Need of 2 Phase Locking Binary locks or shared and exclusive locks in transaction does not guarantee serializability of schedules on its own. For example, consider a banking transaction where the read write locking rules are used, but we get a non serializable schedule which give incorrect result. To ensure the serializability, we use two phase locking. 2 Phase Locking In this scheme, each transaction makes lock and unlock request in 2 phases : 1.A Growing Phase( or An Expanding Phase or First Phase) : In this phase, new locks on the desired data item can be acquired but none can be released. 2. A Shrinking Phase (or Second Phase) : In this phase, existing locks can be released but no new locks can be acquired. A 2 phase locking always results serializable schedule but it does not permit all possible serializable schedules i.e. some serializable schedules will be prohibited by the protocol. Strategy : A transaction T does not allowed to request any lock if T has performed already some unlock operation and every equivalent serial schedule is based on the order of the LOCK point. Question : Consider the scenario and find out weather it is in 2 phase locking or not? Solution : This technique is also called as Basic 2PL. Points About 2 Phase Locking Every non serializable schedule failed to execute using two phase locking. 2 phase locking always results serializable schedule. Equivalent serial schedule is based on the order of the lock point. If schedule is allowed to execute bye 2pl then schedule is conflict serializable but not vice versa. If schedule is not conflict serializable, then schedule is not allowed to execute by 2pl. A schedule is allowed to execute by 2 PL , then it is always serializable. Combining Two Phase Locking and Binary Locking The first problem in binary locking + 2 phase locking is that it leads to deadlock. In the given example, T1 is waiting for T2 to get B and T2 is waiting for T1 to get A. Hence deadlock occurred. The second problem in binary locking + 2 phase locking is that it provides less concurrency because in binary locking, 2 or more transactions simultaneously not allowed to perform read operation. This can easily be shown in the below figure 9. The third problem(shown in fig 10) in binary locking + 2 phase locking is that the same lock is used for read as well as write, so we are unable to differentiate whether the transaction locks a data item for reading purpose or for writing purpose. Combining 2 Phase Locking + Shared and Exclusive Locking Onlyshared and exclusive locks may not ensure serializability i.e. non serializable schedule is possible to execute using shared and exclusive locking. But when we combine both 2 phase locking and shared and exclusive locking, then it always results serializable schedule and the equivalent serial schedule is based on the order of LOCKING. Problems occurred in (2 Phase Locking + Shared and Exclusive Locking) The first problem in 2PL + S/X Locking is that however, we get a serializable schedule but it may not free from irrecoverability. Example : The schedule in figure 11 is not recoverable schedule. To avoid irrecoverability, the Exclusive lock is called until commit/rollback. Hence other transaction is not allowed either to read or write.( no dependency so no problem). Thesecond problem in 2PL + S/X Locking is that it is not free from deadlock. For example To recover from deadlock, deadlock prevention algo is used. But the deadlock prevention algorithm takes more CPU time than the actual work and 99.9 % times the algorithm returns safe. Therefore it is a wastage of precious CPU time. So Unix/Windows/Solaris don't use deadlock prevention algorithm. The third problem in 2PL + S/X Locking is that the schedule is not free from starvation. Variations of 2 Phase Locking : There are number of variations of 2 Phase Locking : Basic 2PL Conservative 2pl ( or static 2PL) Strict 2PL Rigorous 2PL Basic 2PL The technique of two phase locking describe above is known as Basic 2PL. Conservative 2PL( or Static 2PL) InConservative 2PL protocol, a transaction has to lock all the items it access before the transaction begins execution. To avoid deadlocks, we can use Conservative 2PL. Basic 2pl + all locks required to execute the transaction should be hold before starting its execution Advantages of Conservative 2PL : 1. No possibility of deadlock. 2. Ensure serializability. Drawbacks of Conservative 2PL : 1. Less throughput. 2.Less resource utilization because it holds the resources before the transaction begins execution. 3. Less concurrency 4.Prediction of all the required resources before execution is also too complex. 5. Irrecoverability possible since no restriction on unlock operation. 6.However conservative 2pl is a deadlock free protocol but it is difficult to use in practice. 7. Starvation is possible. How Starvation is possible ?? Question : What if some required resource is not available at the time of holding i.e. before the transaction begins execution? Solution : It will unlock all other resources hold by it because they were free. It may cause starvation. Because if at a time A is not available, another point of time B is not available and in another point of time C is not available Strict 2PL A transaction T does not release any of its exclusive(write) locks until after it commits or aborts. Basic 2pl + all the exclusive locks should we hold until commit / rollback. Points about Strict 2PL 1. Strict 2PL ensures serializability and the equivalent serial schedule is based on the lock point. 2. Strict 2PL also ensures strict recoverable schedule. 3. Strict 2PL may not free from deadlock and starvation. Rigorous 2PL Protocol Transactiondoes not release any of its write locks and read locks until after it commits or aborts. Basic 2PL + All S/X locks should be hold until commit / rollback. Rigorous 2pl protocol ensures serializability and the equivalent serial schedule is based on the order of COMMIT. Difference Between 2PL and Rigorous 2PL Protocol Relation Between Conflict Serializable, View Serializable and 2 Phase Locking Schedules There is no comparison between conservative 2PL and ( Strict 2PL, Basic 2PL and Rigorous 2PL) Timestamp Ordering Protocols In timestamp based protocols, the system itself tries to detect possible inconsistency during concurrent execution and recovers from it or avoids it. In it, for every transaction the system executes, the system gives the timestamp to that transaction i.e. it provides a set of timestamps to every transaction Ti by unique timestamp( any integer value) which is denoted by TS(Ti) where TS is timestamp of Ti. What is TS(Ti) ?? Whenever a transaction begins to execute that is just prior to its execution it is provided a timestamp. This timestamp may be a system related actual real time stamp which is based on the system time or it may just be a counter. Example : For example, Say a particular transaction T4 starts - counter = 1 When a transaction T7 starts - counter = 2 When a transaction T3 starts - counter = 3. Whenever we want a transaction to execute the counter is incremented by 1. It is just a logical timing value which indicates that this transaction started before or another transaction. Therefore if we have, TS(Ti) < TS(Tj) then it means that transaction Ti starts its execution before Tj in the system. Soa timestamp indicates when this transaction will starts. The time stamp based protocols will try to check whether the concurrent execution may be consistent or not, or the schedule that occurs must be atleast conflict equivalent to Ti followed by Tj to the serial schedule Ti followed by Tj. ConflictOperations are allowed in the Timestamp ordering schedule, but they must have the order given by the timestamp. If they do not satisfy the order, then Rollback of transaction may or may not be performed. What are R-Timestamp(RTS) and W-Timestamp(WTS)? For Every data item which will be shared, we have got two timestamps : W-Timestamp(Q) or WTS(Q) : Write-timestamp of data item Q R-Timestamp(Q) or RTS(Q) : Read-timestamp of data item Q W-Timestamp(Q) W-Timestamp(Q) denotes the largest TS or TimeStamp value of any transaction that successfully executes Write(Q). R-Timestamp(Q) R-Timestamp(Q) denotes the largest TS or TimeStamp value of any transaction that successfully executes Read(Q). Meaning of R-Timestamp and W-Timestamp : Suppose there is a data item Q which is shared by transactions T1,T2,T3 and T4 with the below TimeStamps executing some read and write operations. T1 : TS(1) = 10 T2 : TS(2) = 20 T3 : TS(3) = 30 T4 : TS(4) = 40 Let initially Q have no value i.e. 0. i.e. R-Timestamp =0 or RTS(Q) = 0 and W-Timestamp = 0 or WTS(Q) = 0 RTS(Q) = 0 and WTS(Q) = 0 means that no transaction has read or write the data item Q. When Transaction issues Read(Q) : Suppose T1 executes Read(Q) whose TS(T1) = 10, then we will have RTS(Q) = 0 10. Then, T3 executes Read(Q) whose TS(T3) = 30, then the value of RTS(Q) = 10 30. Now, if T2 executes Read(Q) whose TS(T2) = 20, then the value of RTS(Q) will remain 30. When Transaction issues Write(Q) : Suppose T1 executes Write(Q) whose TS(T1) = 10, then we will have WTS(Q) = 0 10 Then, T4 executes Write(Q) whose TS(T4) = 40, then the value of WTS(Q) = 10 40. How will we Utilize the Idea of R-Timestamp and W-Timestamp ? Suppose transaction Ti is executed which is using shared data. Whenever a read or write is executed by transaction Ti, the concurrency control mechanisms immediately check certain things. "Whenever transaction Ti starts, the concurrency control mechanism sets the new TS(Ti) value and it stores all the TS(Ti) values and all the W-Timestamp values and the R-Timestamp values of the required data." Based on this, it does some analysis (called the Timestamping Protocol) and based on this analysis it tries to check whether there can be any inconsistency in the behavior or whether the schedule which it executing is equivalent to the serial schedule to the required serial schedule or not. And if it is not, then some corrective actions will takes place by the system itself. Analysis or TSO Protocol : When Ti issues a Read(Q) : Suppose a transaction Ti issues Read(Q) where Q is a shared data item. Now, the TS(Ti) value will be compared with W-Timestamp(WTS) and R-Timestamp(RTS) of Q. Let W-Timestamp(Q) and R-Timestamp(Q) be the start timestamp of a transaction Tj i.e. Case 1 : If TS(Ti) < WTS(Q) ? What does the statement means ? It means that there is a transaction Tj which has written on data item Q before Ti reads it and the transaction Tj started after transaction Ti. According to the condition, transaction T2 is started after transaction T1 and T2 is going to write on the data item Q which is read by T1. According to Timestamps defined - the serial schedule will be T1 → T2. But there is a Conflict Pair : W2(Q); R1(Q) Dependency : T2 → T1 Therefore, the schedule will never ever be equivalent( atleast conflict equivalent) to the serial schedule T1 → T2.The possible situation is described as with the help of an example. What happens when concurrency control protocol mechanism detects the situation? The concurrency control protocol mechanism will detect the situation and it will reject the read. This read cannot be allowed because T1 is trying to read a data item which has been written by transaction T2 and T2 is started after T1. Theschedule maybe consistent but it will not be conflict serializable at all to T1 followed by T2 (T1 → T2). So in general, Ti will be Rolled Back and so this read will be rejected. How do we proceed with the transaction Ti or T1? ROLLBACK Ti or Say T1. RESTART. i.e. Ti or T1 is rolled back and restarted. Case 2 : If TS(Ti) ≥ WTS(Q) ? What does the statement means ? It means that there is a transaction Tj which has written on data item Q after Ti reads it and the transaction Tj started before transaction Ti. The possible situation is described as with the help of an example. According to the condition, transaction T2 is started before transaction T1 and T2 is going to write on the data item Q which is already read by T1. According to Timestamps defined - the serial schedule will be T1 → T2. The Conflict Pair will be : Conflict Pair : R1(Q); W2(Q) Dependency : T1 → T2 as shown. Therefore, the schedule is conflict equivalent to the serial schedule T1 → T2. What happens when concurrency control protocol mechanism detects the situation? The concurrency control protocol mechanism will detect the situation and it will allow the read by transaction Ti or say T1. and It sets the R-Timestamp(Q) as : RTS(Q) = max (RTS(Q),TS(T1)) Why there is an equal to condition in TS(Ti) ≥ WTS(Q) ? The reason of the equal to condition is that Ti or T1 itself may have written Case 3 & 4 : If TS(Ti) < RTS(Q) ? or If TS(Ti) < RTS(Q) ? Case 3 : What does the statement TS(Ti) < RTS(Q) means? It means that there is a transaction Tj which reads a data item Q before Ti reads it and the transaction Tj started after transaction Ti. Case 4 : What does the statement TS(Ti) > RTS(Q) means? It means that there is a transaction Tj which reads a data item Q after Ti reads it and the transaction Tj started before transaction Ti. In both cases 3 & 4, transactions Ti and Tj - both are reading. So, there will be no conflict pair exists. The possible situation is described as with the help of an example. According to Timestamps defined - the serial schedule will be T1 → T2. Since no Conflict Pair will exists as both transactions(T1 and T2) are reading in both the cases 3 & 4. Therefore, the schedule is conflict equivalent to the serial schedule T1 → T2. What happens when concurrency control protocol mechanism detects the situation? The concurrency control protocol mechanism will detect the situation and it will allow the read by transaction Ti or say T1. and It sets the R-Timestamp(Q) as : RTS(Q) = max (RTS(Q),TS(T1)) Example for Case 4 : When Ti issues a Write(Q) : Suppose a transaction Ti issues Write(Q) where Q is a shared data item. Now, the TS(Ti) value will be compared with W-Timestamp(WTS) and R-Timestamp(RTS) of Q. Let W-Timestamp(Q) and R-Timestamp(Q) be the start timestamp of a transaction Tj i.e. TS(Tj) = WTS(Q) and TS(Tj) = RTS(Q) Case 1 : If TS(Ti) < RTS(Q) ? What does the statement means ? It means that there is a transaction Tj which reads a data item Q before Ti writes it and the transaction Tj started after transaction Ti. The possible situation is described as with the help of an example. According to the condition, transaction T2 is started after transaction T1 and T2 is going to read the data item Q which is write by T1. According to Timestamps defined - the serial schedule will be T1 → T2. But there is a Conflict Pair : R2(Q); W1(Q) Dependency : T2 → T1 as shown. Therefore, the schedule will never ever be equivalent ( atleast conflict equivalent) to the serial schedule T1 → T2. What happens when concurrency control protocol mechanism detects the situation? The concurrency control protocol mechanism will detect the situation and it will reject the write. Theschedule maybe consistent but it will not be conflict serializable at all to T1 followed by T2 (T1 → T2). So in general, Ti will be Rolled Back and so this write will be rejected. How do we proceed with the transaction Ti or T1? ROLLBACK Ti or Say T1. RESTART. i.e. Ti or T1 is rolled back and restarted. Case 2 : If TS(Ti) < WTS(Q) ? What does the statement means ? It means that there is a transaction Tj which has written on data item Q before Ti writes it and the transaction Tj started after transaction Ti The possible situation is described as with the help of an example. According to the condition, transaction T2 is started after transaction T1 and T2 writes on the data item Q which is again write by T1. According to Timestamps defined - the serial schedule will be T1 → T2. The Conflict Pair will be : Conflict Pair : W2(Q); W1(Q) Dependency : T2 → T1 as shown. Therefore, the schedule will never ever be equivalent( atleast conflict equivalent) to the serial schedule T1 → T2 What happens when concurrency control protocol mechanism detects the situation? The concurrency control protocol mechanism will detect the situation and it will reject the write. Theschedule maybe consistent but it will not be conflict serializable at all to T1 followed by T2 (T1 → T2). So in general, Ti will be Rolled Back and so this write will be rejected. Case 3 : If TS(Ti) > RTS(Q) ? Case 3 : What does the statement TS(Ti) > RTS(Q) means? It means that there is a transaction Tj which reads a data item Q after Ti writes it and the transaction Tj started before transaction Ti. The possible situation is described as with the help of an example. According to the condition, transaction T2 is started before transaction T1 and T2 reads the data item Q written by T1. According to Timestamps defined - the serial schedule will be T1 → T2. The Conflict Pair will be : Conflict Pair : W1(Q); R2(Q) Dependency : T1 → T2 as shown. Therefore, the schedule is conflict equivalent to the serial schedule T1 → T2. What happens when concurrency control protocol mechanism detects the situation? The concurrency control protocol mechanism will detect the situation and it will allow the write by transaction Ti or say T1. and It sets the W-Timestamp(Q) as : WTS(Q) = TS(T1) Case 4 : If TS(Ti) > WTS(Q) ? Case 4 : What does the statement TS(Ti) > WTS(Q) means? It means that there is a transaction Tj which writes a data item Q after Ti writes it and the transaction Tj started before transaction Ti. The possible situation is described as with the help of an example. According to the condition, transaction T2 is started before transaction T1 and T2 writes the data item Q which is already written by T1. According to Timestamps defined - the serial schedule will be T1 → T2. The Conflict Pair will be : Conflict Pair : W1(Q); W2(Q) Dependency : T1 → T2 as shown. Therefore, the schedule is conflict equivalent to the serial schedule T1 → T2. What happens when concurrency control protocol mechanism detects the situation? The concurrency control protocol mechanism will detect the situation and it will allow the write by transaction Ti or say T1. and It sets the W-Timestamp(Q) as : WTS(Q) = TS(T1) Points About Timestamp Ordering : Timestamp Based Ordering Protocol is : Conflict serializable i.e. it is conflict equivalent to some serial schedule. It is deadlock free because timestamp ordering protocol allows rollback and restart it after detecting a mismatched situation. Additional Overheads as Compared to Lock Based Protocol Overheads of maintaining the timestamps - RTS and WTS Checking each transaction whenever a read or write operation is executed. Updating each transaction after execution of read and write operation. Advantages over overheads : However, there are a lot of overhead but the advantage is the user need not to be bother at all about what is going on. But in Lock Based Protocols - the overheads are of waiting and the responsibility is of the user to write the consistent concurrent transaction. Most widely used protocol is timestamp ordering protocol. Timestamp ordering protocol is a system automated protocol for concurrency control. Questions on Timestamp Ordering Thomas write rule Criteria For Modification : Before improving the BTSO protocol, it must follow some criteria : The First Criteria used to maintain the consistency of the system. After maintaining the consistency of the system, the next criteria is to trying to see whether we can enhance the concurrency of the mechanism. Modification in BTSO Protocol When Ti issues a READ (R(Q)) : Deadlock in Transaction What is Deadlock and How the Deadlock is occurred? Deadlock occurs when each transaction T in a set of 2 or more transactions is waiting for some item that is locked by some other transaction T' in the set. Hence, each transaction in the set is on a waiting queue, waiting for one of the other transaction in the set to release the lock on an item. Where the Deadlock Occurs ? The concurrency control technique called locking may lead to deadlock. Locking 2 Phase Locking (2PL) 2PL + Binary Locking 2PL + Shared and Exclusive Locking However, 2 phase locking ensures serializability but it may lead to a deadlock state. The combination of (binary lock + 2 PL) and the combination of (shared and exclusive lock + 2 PL) also may lead to a deadlock state. What is Deadlock Detection and Deadlock Prevention ? Deadlock Detection deals with deadlocks in which the system check if state of deadlock actually exists or not whereas Deadlock Prevention Protocols are used to prevent from deadlock. When to Use Deadlock Detection and When to Use Deadlock Prevention? If the transaction load is light or if the transactions are short and each transaction lock only a few items, then we should use a deadlock detection scheme. However, if the transaction load is quite heavy or if the transaction are long and each transaction uses many data items, then we should use a deadlock prevention scheme. Deadlock Detection Schemes Wait-For-Graph (WFG) To detect the state of deadlock, directed graph is to be constructed which is called wait-for-graph(WFG). The nodes of WFG are labelled with active transaction names. and an edge exists from Ti → Tj from node Ti to node Tj iff transaction Ti is waiting for transaction Tj to release some lock. For a deadlock to be occur, the WFG must have a cycle and the scheduler can detect deadlocks by checking for cycles in the WFG. After the detection of the deadlock, it can be prevent by aborting a transaction. The scheduler should select the transaction among the transactions involved in the cycle of WFG whose absorption involves minimum cost i.e. it tries to select younger transactions( which do not made many changes). The chosen transaction for abort is called victim transaction. Point : The algorithm for victim selection generally avoid those transactions that have been for a long time and that have performed many updates. Timeout Method Timeout Method is more practical scheme for deadlock. In timeout method if a transaction waits for a period longer than a system defined timeout period, the system assumes that the transaction may be in deadlock state and aborts it regardless of whether a deadlock actually exist or not. Deadlock Prevention Protocols Deadlock prevention protocols are used to prevent from deadlock. There are various deadlock prevention protocols which are : Conservative Two Phase Locking (Conservative 2PL) Since each transaction lock all the items it needs in advance, therefore no deadlock occurs. If any of the items cannot be obtained then none of the items are locked. The conservative 2pl is not a practical method as it provides less concurrency and it can leads to starvation also. Ordering All the Items Protocol This protocol also limits concurrency because it involves ordering all the items in the database and making sure that the transaction that need several items will look them according to that order. It is also not a practical method as the programmer order system is aware of the chosen order of the items. No Waiting Algorithm In No Waiting Algorithm scheme if a transaction is unable to obtain a lock it is immediately abort it and then restarted after a certain time delay without seeing whether a deadlock will actually occur or not. This Algorithm can cause transactions to abort and restart needlessly Cautious Waiting Algorithm To reduce the number of needless aborts and restarts, the cautious algorithm is used. Suppose there are two transaction Ti and Tj. Ti is waiting for a data item X which is locked by Tj. If Tj : Blocked( waiting for some other locked data item) - Abort Ti. If Tj : Not Blocked( not waiting for some other locked data item) - Ti is blocked and allowed to wait. The Cautious Waiting is Deadlock Free because no transaction will never wait for another blocked transaction. Deadlock Prevention Using the Concept of Timestamp Ordering [TS(T)] Transaction timestamp TS(T) is a unique identifier assigned to each transaction based on the order in which transaction are started. Transaction timestamp TS(T) Timestamp TS(T) is a unique identifier but not any kind of time. TS(T) can be understand as a priority identifier in which the lesser number identifies the older transaction and the greater number identifies the younger transaction i.e. if T1 starts before transaction T2, then, TS(T1) < TS(T2). (Lesser priority) (Higher priority) (Older Transaction) (Younger Transaction) There are two methods for preventing deadlock using the concept of timestamp ordering : 1. Wait Die 2. Wound Wait Suppose there are two transaction Ti and Tj where Ti is older than Tj i.e. TS(Ti) < TS(Tj), The following rules are followed by these schemes Wait die : Iftransaction Ti is waiting for a data item X which is locked by Tj, then Ti is allowed to wait. If transaction Tj is waiting for a data item X which is locked by Ti abort Tj or rollback Tj and restart it later with the same timestamp. Wound Wait : If transaction Ti is waiting for a data item X which is locked by Tj then abort Tj or rollback Tj and restart it later with the same timestamp. If transaction Tj is waiting for a data item X which is locked by Ti, then Tj is allowed to wait. Starvation in Deadlock (Livelock) What is Starvation? The indefinite waiting of a transaction is called as starvation. When does Starvation Occurs? Starvation occurs when a transaction cannot proceed for an indefinite period of time while other transactions in the system continue normally. Example : Suppose 2 transactions T1 and T2 are trying to acquire a lock on a data item X, the scheduler grants the lock to T1. Before the execution of T1 is over, suppose another transaction T3 also request unlock on data item X and when T1 unlock data item X, both T2 and T3 are trying to acquire a lock on a data item X. If the scheduler now grants this lock to T3, then T2 has to wait again. If another 4th transaction T4 may ask for data item X and the scheduler may deprive T2 again when T3 unlocks data item X, then T2 may have to wait again. Thus T2 may have to wait indefinitely although there is no deadlock. Such a situation is called Livelock or Starvation. Starvation generally happens when we use a Priority Queue in which a lower priority transaction will starve due to frequently coming of higher priority transaction. It can also occurred because of victim selection if the algorithm select the same transaction as victim repeatedly, thus causing it to abort and never finish execution. Solutions of Starvation Solution 1 : Using a FCFS Queue For avoiding the Livelock is to follow a fair scheduling policy such as First Come First Serve (FCFS) queue in which transactions are unable to lock an item in the order in which they originally requested the lock. Solution 2 : Increasing Priority Allow some transaction to have priority over others. But increasing the priority of a transaction, it has to wait longer until a highest priority transaction comes and proceeds. Solutions 3 : Modifying the Victim Selection Algorithm The algorithm can use higher priority for transactions that have been aborted multiple time to avoid this problem. Solution 4 : Using wait-die and wound-wait schemes The wait-die and wound-wait schemes can also be used avoid the problem of starvation because they restart a transaction that has been aborted with its same original timestamp, so the possibility that the same transaction is aborted repeatedly is slim. Multiple Granularity Granularity is the size of data item allowed to lock. Multiple Granularity is the hierarchically breaking up the database into portions which are lockable and maintaining the track of what to be lock and how much to be lock so that it can be decided very quickly either to lock a data item or to unlock a data item

Transaction in DBMS PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue