CIS 9340 Database transactions.pdf
Document Details
Uploaded by ProblemFreeQuail
Tags
Full Transcript
CIS 9340 Database Transaction MultiUser Databases • • Multi-user database – more than one user processes the database at the same time Several general issues arise: 1. How can we prevent users from interfering with each other’s work ? 2. How can we safely process transactions on the database witho...
CIS 9340 Database Transaction MultiUser Databases • • Multi-user database – more than one user processes the database at the same time Several general issues arise: 1. How can we prevent users from interfering with each other’s work ? 2. How can we safely process transactions on the database without corrupting or losing data ? 3. If there is a problem (e.g., power failure or system crash), how can we recover without losing all of our data ? Transaction Processing • • • • • • We need the ability to control how transactions are run in a multi-user database. A transaction is a set of read and write operations that must either commit or abort. Consider the following transaction that reserves a seat on an airplane flight and changes the customer:<="" pre="" style="box-sizing: border-box;"> Suppose that after the second step, the database crashes. Or for some reason, payment charges can not be written… Transactions can either reach a commit point, where all actions are permanently saved in the database or they can abort in which case none of the actions are saved. Another way to say this is transactions are Atomic. All operations in a transaction must be executed as a single unit – Logical Unit of Work. ACID Properties of Transactions • • • • Atomic – A transaction is an indivisible unit of work that either commits or aborts. Consistency Preserving – A transaction will take the database from one consistent state to another consistent state. Isolated – The work a transaction does should not be known to other transactions until it commits. Durable – One a transaction commits, its work can not be lost even to a future failure. • • • First, what should the values for salary (in the first example) really be ? The DBMS must find a way to execute these two transactions concurrently and ensure the result is what the users (and designers) intended. These two are examples of the Lost Update or Concurrent Update problem. Some changes to the database can be overwritten. Four General classes of problems with Transaction Processing 1. 2. 3. 4. The Lost Update Problem The Dirty Read Problem (also called uncommitted dependency) The Incorrect Analysis Problem The Non-Repeatable Read Problem The reason we get the wrong final result is because TA had not committed before TB was allowed to read a value TA had written. This is called the uncommitted dependency or Dirty read problem. The Non-Repeatable (Fuzzy) Read Problem Suppose we have a transaction that says if P300 is greater than P200 then decrement P300. At the same time another transaction is trying to update P300 ( The (c) means clean and (d) means dirty ). Serial Schedules and Serializability • • • • • • • If we insist only one transaction can execute at a time, in serial order, then performance will be quite poor. Transaction throughput: The number of transactions we can perform in a given time period. Often reported as Transactions per second or TPS. A group of two or more concurrent transactions are serializable if we can order their operations so that the final result is the same as if we had run them in serial order (one after another). Consider transaction A, B, C and D. Each has 3 operations. If executing: A1, B1, A2, C1, C2, B2, A3, B3, C3 has the same result as executing: A1, A2, A3, B1, B2, B3, C1, C2, C3 Then the above schedule of transactions and operations is serialized. Concurrency Control is a method for controlling or scheduling the operations in such a way that concurrent transactions can be executed. If we do concurrency control properly, then we can maximize transaction throughput while avoiding any problems. More on Concurrency control in the next set of notes. Below is a simple example of creating both a Customer and an Appointment record within the same transaction. If either statement fails, the transaction is rolled back. Concurrency Control Multi-User databases need to control the order in which data are read and updated by multiple users. Concurrency Control provides a mechanism to ensure that the work of multiple users do not interfere and potentially corrupt the data in the database. Concurrency Control and Locking • • • We need a way to guarantee that our concurrent transactions can be serialized. Locking is one such means. Locking is done to data items in order to reserve them for future operations. A lock is a logical flag set by a transaction to alert other transactions the data item is in use. Characteristics of Locks • Locks may be applied to data items in two ways: Implicit Locks are applied by the DBMS Explicit Locks are applied by application programs. • Locks may be applied to: 1. a single data item (value) 2. an entire row of a table 3. a page (memory segment) (many rows worth – a “Block” on disk) 4. an entire table 5. an entire database This is referred to as the Lock granularity • Locks may be of type types depending on the requirements of the transaction: 1. An Exclusive Lock prevents any other transaction from reading or modifying the locked item. 2. A Shared Lock allows another transaction to read an item but prevents another transaction from writing the item. Two Phase Locking • The most commonly implemented locking mechanism is called Two Phased Locking or 2PL. 2PL is a concurrency control mechanism that ensures serializability. • 2PL has two phases: Growing and shrinking. 1. A transaction acquires locks on data items it will need to complete the transaction. This is called the growing phase. 2. Once one lock is released, all no other lock may be acquired. This is called the shrinking phase. • Consider our prior example, this time using locks: Note that at time 5, Transaction B is Denied a lock. So Transaction B will “spin” in place until that lock on P200 is released. Another 2 Phase Locking Example • • • • • Amy and Bill are two employees who will get a 5% raise this year. Carl will get a 3% raise this year. Transaction A needs to Read the data item raise_rate (labeled R_R below). Then Read the item Amy_Salary. Finally it will write a new value for Amy_Salary. Transaction B needs to Read the data item raise_rate. Then Read the item Bill_Salary. Finally it will write a new value for Bill_Salary. Transaction C needs to Read the data item raise_rate. Then it will write a new raise_rate (3%). Then it will Read the item Carl_Salary. Finally it will write a new value for Carl_Salary. Database Recovery From Transaction Processing, we know that every transaction should reach a commit point (or abort). What happens when the database or DBMS fails before commit (or abort)? The Database Recovery subsystem supports recovery and the Durability (the “D” in ACID) property to ensure data can never be lost. Database Recovery • There are many situations in which a transaction may not reach a commit or abort point. 1. An operating system crash can terminate the DBMS processes 2. The DBMS can crash 3. The system might lose power 4. A disk may fail or other hardware may fail. 5. Human error can result in deletion of critical data. • • • • In any of these situations, data in the database may become inconsistent or lost. For example, if a transaction has completed 30 out of 40 scheduled writes to the database when the DBMS crashes, then the database may be in an inconsistent state as only part of the transaction’s work was completed. Database Recovery is the process of restoring the database and the data to a consistent state. This may include restoring lost data up to the point of the event (e.g. system crash). Two approaches are discussed here: Manual Reprocessing and Automated Recovery. Manual Reprocessing • In a Manual Reprocessing recovery approach, the database is periodically backed up (a database save) and all transactions applied since the last save are recorded • If the system crashes, the latest database backup set is restored and all of the transactions are re-applied (by users) to bring the database back up to the point just before the crash. • Several shortcomings to the Manual Reprocessing approach: 1. Time required to re-apply transactions 2. Transactions might have other (physical) consequences 3. Re-applying concurrent transactions in the same original sequence may not be possible. Automated Recovery with Rollback / Rollforward • • • • • • • • • • As with the manual recovery approach we also make periodic backups of the database (time consuming operation). In the Automated Recovery approach, we introduce a Log file – this is a file separate from the data that records all of the changes made to the database by transactions. Also referred to as a Journal. This transaction log Includes information helpful to the recovery process such as: A transaction identifier, the date and time, the user running the transaction, before images and after images Before Image: A copy of the table record (or data item) before it was changed by the transaction. After Image: A copy of the table record (or data item) after it was changed by the transaction. Rollback: Undo any partially completed transactions (ones in progress when the crash occurred) by applying the before images to the database. Rollforward: Redo the transactions by applying the after images to the database. This is done for transactions that were committed before the crash. The Automated Recovery process uses both rollback and rollforward to restore the database. In the worst case, we would need to rollback to the last database backup point and then rollforward to the point just before the crash. Checkpoints can also be taken (less time consuming) in between database saves. • • • • • The DBMS flushes all pending transactions and writes all data to disk and transaction log. Database can be recovered from the last checkpoint in much less time. Log file (Journal) at the time of the system crash has the following entries: IF the system crash was due to a process failure, the DBMS recovery mechanism would ROLL BACK all non-committed transactions (Transactions A and C in this case) If the system crash was due to a media failure, the DBMS recovery mechanism would first restore the most recent data backup, and then ROLL FORWARD all of the committed transactions it sees in the log (Transaction B only in this case).