CIS 9340 Chapter 22 - Transaction Management PDF
Document Details
Uploaded by ProblemFreeQuail
Tags
Summary
This document explains transaction management within a database context. It covers topics such as concurrency control, recovery, and different types of transactions. The focus is on the properties and characteristics of transactions.
Full Transcript
CIS 9340 CHAPTER 22 TRANSACTION MANAGEMENT . TRANSACTION MANAGEMENT There are three closely related functions that are intended to ensure that the database is reliable and remains in a consistent state: transaction support, concurrency control services, and recovery services. This reliability and...
CIS 9340 CHAPTER 22 TRANSACTION MANAGEMENT . TRANSACTION MANAGEMENT There are three closely related functions that are intended to ensure that the database is reliable and remains in a consistent state: transaction support, concurrency control services, and recovery services. This reliability and consistency must be maintained in the presence of failures of both hardware and software components, and when multiple users are accessing the database. TRANSACTION MANAGEMENT To overcome this, the DBMS implements a concurrency control protocol that prevents database accesses from interfering with one another. Database recovery is the process of restoring the database to a correct state following a failure. The failure may be the result of a system crash due to hardware or software errors, a media failure, such as a head crash, or a software error in the application, such as a logical error in the program that is accessing the database. It may also be the result of unintentional or intentional corruption or destruction of data or facilities by system administrators or users. Whatever the underlying cause of the failure, the DBMS must be able to recover from the failure and restore the database to a consistent state. TRANSACTION MANAGEMENT A Transaction is an action, or series of actions, carried out by a single user or application program that reads or updates the contents of the database. It is treated as a logical unit of work on the database. It may be an entire program, a part of a program, or a single statement (for example, the SQL statement INSERT or UPDATE), A transaction should always transform the database from one consistent state to another, although we accept that consistency may be violated while the transaction is in progress. It can have one of two outcomes. If it completes successfully, the transaction is said to have been committed, and the database reaches a new consistent state. On the other hand, if the transaction does not execute successfully, the transaction is aborted. If a transaction is aborted, the database must be restored to its consistent state before it starts. Such a transaction is rolled back or undone. TRANSACTION MANAGEMENT A committed transaction cannot be aborted. If we decide the committed transaction was a mistake, we must perform another compensating transaction to reverse its effects. However, an aborted transaction that is rolled back can be restarted later and, depending on the cause of the failure, may successfully execute and commit at that time. The DBMS does not know which updates are grouped to form a logical transaction. It must therefore provide a method to allow the user to indicate the boundaries of a transaction. The keywords BEGIN TRANSACTION, COMMIT, and ROLLBACK (or their equivalent†) are available in many data manipulation languages to delimit transactions. If these delimiters are not used, the entire program is usually regarded as a single transaction, with the DBMS automatically performing a COMMIT when the program terminates correctly and a ROLLBACK if it does not. TRANSACTION MANAGEMENT There are two other states: • PARTIALLY COMMITTED, which occurs after the final statement has been executed. At this point, it may be found that the transaction has violated serializability or has violated an integrity constraint and the transaction has to be aborted. Alternatively, the system may fail and any data the transaction updates may not have been safely recorded on secondary storage. In such cases, the transaction would go into the FAILED state and would have to be aborted. If the transaction has been successful, any updates can be safely recorded, and the transaction can go to the COMMITTED state. • FAILED, which occurs if the transaction cannot be committed or the transaction is aborted while in the ACTIVE state, perhaps due to the user aborting the transaction or as a result of the concurrency control protocol aborting the transaction to ensure serializability. TRANSACTION MANAGEMENT Properties of Transactions There are properties that all transactions should possess. The four basic, or so called ACID, properties that define a transaction are (Haerder and Reuter, 1983): • Atomicity. The “all or nothing” property. A transaction is an indivisible unit that is either performed in its entirety or is not performed at all. It is the responsibility of the recovery subsystem of the DBMS to ensure atomicity. • Consistency. A transaction must transform the database from one consistent state to another consistent state. Both the DBMS and the application developers are responsible for ensuring consistency. The DBMS can ensure consistency by enforcing all the constraints that have been specified on the database Schema, such as integrity constraints. However, in itself, this is insufficient to ensure consistency. For example, suppose that we have a transaction that is intended to transfer money from one bank account to another and the programmer makes an error in the transaction logic and debits one account but credits the wrong account. The database is in an inconsistent state. However, the DBMS would not have been responsible for introducing this inconsistency and would have had no ability to detect the error. TRANSACTION MANAGEMENT • Isolation. Transactions execute independently of one another. In other words, the partial effects of incomplete transactions should not be visible to other transactions. It is the responsibility of the concurrency control subsystem to ensure isolation. • Durability. The effects of a successfully completed (committed) transaction are permanently recorded in the database and must not be lost because of a subsequent failure. It is the responsibility of the recovery subsystem to ensure durability. TRANSACTION MANAGEMENT Database Architecture There are four high-level database modules that handle transactions, concurrency control, and recovery. The transaction manager coordinates transactions on behalf of application programs. It communicates with the scheduler, the module responsible for implementing a particular strategy for concurrency control. The scheduler is sometimes referred to as the lock manager if the concurrency control protocol is locking-based. The objective of the scheduler is to maximize concurrency without allowing concurrently executing transactions to interfere with one another, and so compromise the integrity or consistency of the database. TRANSACTION MANAGEMENT If a failure occurs during the transaction, then the database could be inconsistent. It is the task of the recovery manager to ensure that the database is restored to the state it was in before the start of the transaction, and therefore a consistent state. Finally, the buffer manager is responsible for the efficient transfer of data between disk storage and main memory. TRANSACTION MANAGEMENT Concurrency Control The process of managing simultaneous operations on the database without having them interfere with one another. The Need for Concurrency Control A major objective in developing a database is to enable many users to access shared data concurrently. Concurrent access is relatively easy if all users are only reading data, as there is no way that they can interfere with one another. However, when two or more users are accessing the database simultaneously and at least one is updating data, there may be interference that can result in inconsistencies. This objective is similar to the objective of multi-user computer systems, systems, which allow two or more programs (or transactions) to execute at the same time. TRANSACTION MANAGEMENT Concurrency Control The process of managing simultaneous operations on the database without having them interfere with one another. The Need for Concurrency Control A major objective in developing a database is to enable many users to access shared data concurrently. Concurrent access is relatively easy if all users are only reading data, as there is no way that they can interfere with one another. However, when two or more users are accessing the database simultaneously and at least one is updating data, there may be interference that can result in inconsistencies. This objective is similar to the objective of multi-user computer systems, systems, which allow two or more programs (or transactions) to execute at the same time. TRANSACTION MANAGEMENT Such systems can allow two or more transactions to execute simultaneously. The system begins executing the first transaction until it reaches an I/O operation. While the I/O is being performed, the CPU suspends the first transaction and executes commands from the second transaction. When the second transaction reaches an I/O operation, control then returns to the first transaction, and its operations are resumed from the point at which it was suspended. The first transaction continues until it again reaches another I/O operation. In this way, the operations of the two transactions are interleaved to achieve concurrent execution. In addition, throughput—the amount of work accomplished in a given time interval—is improved as the CPU executes other transactions instead of being in an idle state waiting for I/O operations to complete. TRANSACTION MANAGEMENT However, although two transactions may be perfectly correct in themselves, the interleaving of operations in this way may produce an incorrect result, thus compromising the integrity and consistency of the database. We examine three examples of potential problems caused by concurrency: the lost update problem, the uncommitted dependency problem, and the inconsistent analysis problem. To illustrate these problems, we use a simple bank account relation that contains the DreamHome staff account balances. In this context, we are using the transaction as the unit of concurrency control. TRANSACTION MANAGEMENT The lost update problem Another user can override an apparently successfully completed update operation by one user. This is known as the lost update problem and is illustrated in which transaction T1 is executing concurrently with transaction T2. T1 is withdrawing £10 from an account with a balance balx, initially £100, and T2 is depositing £100 into the same account. If these transactions are executed serially, one after the other with no interleaving of operations, the final balance would be £190 no matter which transaction is performed first. However, although two transactions may be perfectly correct in themselves, the interleaving of operations in this way may produce an incorrect result, thus compromising the integrity and consistency of the database. We examine three examples of potential problems caused by concurrency: the lost update problem, the uncommitted dependency problem, and the inconsistent analysis problem. To illustrate these problems, we use a simple bank account relation that contains the DreamHome staff account balances. TRANSACTION MANAGEMENT The inconsistent analysis problem The problem of inconsistent analysis occurs when a transaction reads several values from the database but a second transaction updates some of them during the execution of the first. For example, a transaction that is summarizing data in a database (for example, totaling balances) will obtain inaccurate results if, while it is executing, other transactions are updating the database. summary transaction T6 is executing concurrently with transaction T5. Transaction T6 is totaling the balances of account x (£100), account y (£50), and account z (£25). However, in the meantime, transaction T5 has transferred £10 from balx to balz, so that T6 now has the wrong result (£10 too high). This problem is avoided by preventing transaction T6 from reading balx and balz until after T5 has completed its updates. TRANSACTION MANAGEMENT Another problem can occur when a transaction T rereads a data item it has previously read but, in between, another transaction has modified it. Thus, T receives two different values for the same data item. This is sometimes referred to as a nonrepeatable (or fuzzy) read. A similar problem can occur if transaction T executes a query that retrieves a set of tuples from a relation satisfying a certain predicate, re-executes the query at a later time, but finds that the retrieved set contains an additional (phantom) tuple that another transaction has inserted in the meantime. This is sometimes referred to as a phantom read. Before we discuss the main concurrency control techniques, we discuss some key concepts associated with serializability and recoverability of transactions. TRANSACTION MANAGEMENT Serializability and Recoverability The objective of a concurrency control protocol is to schedule transactions in such a way as to avoid any interference between them and, hence, prevent the types of problems described in the previous section. One obvious solution is to allow only one transaction to execute at a time: one transaction is committed before the next transaction is authorized to begin. However, the aim of a multi-user DBMS is also to maximize the degree of concurrency or parallelism in the system, so that transactions that can execute without interfering with one another can run in parallel. For example, transactions that access different parts of the database can be scheduled together without interference. In this section, we examine serializability as a means of helping to identify those executions of transactions that are guaranteed to ensure consistency (Papadimitriou, 1979) TRANSACTION MANAGEMENT Schedule A sequence of the operations by a set of concurrent transactions that preserves the order of the operations in each individual transaction. A transaction comprises a sequence of operations consisting of read and/or write actions to the database, followed by a commit or abort action. A schedule S consists of a sequence of the operations from a set of n transactions T1, T2,..., Tn, subject to the constraint that the order of operations for each transaction is preserved in the schedule. TRANSACTION MANAGEMENT Serial Schedule A schedule where the operations of each transaction are executed consecutively without any interleaved operations from other transactions. In a serial schedule, the transactions are performed in serial order. For example, if we have two transactions T1 and T2, the serial order would be T1 followed by T2, or T2 followed by T1. Thus, in serial execution, there is no interference between transactions because only one is executing at any given time. However, there is no guarantee that the results of all serial executions of a given set of transactions will be identical. In banking, for example, it matters whether interest is calculated on an account before a large deposit is made or after.. TRANSACTION MANAGEMENT If a set of transactions executes concurrently, we say that the (nonserial) schedule is correct if it produces the same result as some serial execution. Such a schedule is called serializable. To prevent inconsistency from transactions interfering with one another, it is essential to guarantee the serializability of concurrent transactions. In serializability, the ordering of read and write operations is important: • If two transactions only read a data item, they do not conflict and order is not important. • If two transactions either read or write completely separate data items, they do not conflict and order is not important. • If one transaction writes a data item and another either reads or writes the same data item, the order of execution is important. TRANSACTION MANAGEMENT TRANSACTION MANAGEMENT Consider the schedule S1 shown in Figure 22.7 (a) containing operations from two concurrently executing transactions T7 and T8. Because the write operation on balx in T8 does not conflict with the subsequent read operation on baly in T7, we can change the order of these operations to produce the equivalent schedule S2 shown in Figure 22.7 (b) If we also now change the order of the following non-conflicting operations, we produce the equivalent serial schedule S3 shown in Figure 22.7 (c) • Change the order of the write(balx) of T8 with the write(baly) of T7. • Change the order of the read(balx) of T8 with the read(baly) of T7. • Change the order of the read(balx) of T8 with the write(baly) of T7. Schedule S3 is a serial schedule and, because S1 and S2 are equivalent to S3, S1, and S2 are serializable schedules. TRANSACTION MANAGEMENT Testing for conflict serializability Under the constrained write rule (that is, a transaction updates a data item based on its old value, which is first read by the transaction), a precedence (or serialization) graph can be produced to test for conflict serializability. For a schedule S, a precedence graph is a directed graph G = (N, E) that consists of a set of nodes N and a set of directed edges E, which is constructed as follows: • Create a node for each transaction. • Create a directed edge Ti Tj, if Tj reads the value of an item written by Ti. • Create a directed edge Ti Tj, if Tj writes a value into an item after it has been read by Ti. • Create a directed edge Ti Tj, if Tj writes a value into an item after it has been written by Ti. If an edge Ti Tj exists in the precedence graph for S, then in any serial schedule S9 equivalent to S, Ti must appear before Tj. If the precedence graph contains a cycle, the schedule is not conflict serializable. TRANSACTION MANAGEMENT View serializability Several other types of serializability offer less stringent definitions of schedule equivalence than that of conflict serializability. One less restrictive definition is called view serializability. A schedule is view serializable if it is view equivalent to a serial schedule. Every conflict serializable schedule is viewed as serializable, although the converse is not true. Testing for view serializability Testing for view serializability is much more complex than testing for conflict serializability. In fact, it has been shown that testing for view serializability is NP-complete; thus, it is highly improbable that an efficient algorithm can be found TRANSACTION MANAGEMENT Recoverability Serializability identifies schedules that maintain the consistency of the database, assuming that none of the transactions in the schedule fails. An alternative perspective examines the recoverability of transactions within a schedule. If a transaction fails, the atomicity property requires that we undo the effects of the transaction. In addition, the durability property states that once a transaction commits, its changes cannot be undone (without running another, compensating, transaction). This schedule is a nonrecoverable schedule, which should not be allowed. TRANSACTION MANAGEMENT Locking A procedure used to control concurrent access to data. When one transaction is accessing the database, a lock may deny access to other transactions to prevent incorrect results. Locking methods are the most widely used approach to ensure the serializability of concurrent transactions. There are several variations, but all share the same fundamental characteristic, namely that a transaction must claim a shared (read) or exclusive (write) lock on a data item before the corresponding database read or write operation. The lock prevents another transaction from modifying the item or even reading it, in the case of an exclusive lock. TRANSACTION MANAGEMENT Shared Lock If a transaction has a shared lock on a data item, it can read the item but not update it. Exclusive lock If a transaction has an exclusive lock on a data item, it can both read and update the item. Data items of various sizes, ranging from the entire database down to a field, may be locked. The size of the item determines the fineness, or granularity, of the lock. The actual lock might be implemented by setting a bit in the data item to indicate that that portion of the database is locked, or by keeping a list of locked parts of the database, or by other means. TRANSACTION MANAGEMENT Because read operations cannot conflict, it is permissible for more than one transaction to hold shared locks simultaneously on the same item. On the other hand, an exclusive lock gives a transaction exclusive access to that item. Thus, as long as a transaction holds the exclusive lock on the item, no other transactions can read or update that data item. item. Locks are used in the following way: • Any transaction that needs to access a data item must first lock the item, requesting a shared lock for read-only access or an exclusive lock for both read and write access. • If the item is not already locked by another transaction, the lock will be granted. • If the item is currently locked, the DBMS determines whether the request is compatible with the existing lock. If a shared lock is requested on an item that already has a shared lock on it, the request will be granted; otherwise, the transaction must wait until the existing lock is released. Connolly, Thomas. Database Systems (p. 635). Pearson Education. Kindle Edition. TRANSACTION MANAGEMENT • A transaction continues to hold a lock until it explicitly releases it either during execution or when it terminates (aborts or commits). It is only when the exclusive lock has been released that the effects of the write operation will be made visible to other transactions. In addition to these rules, some systems permit a transaction to issue a shared lock on an item and then later to upgrade the lock to an exclusive lock. This in effect allows a transaction to examine the data first and then decide whether it wishes to update it. If upgrading is not supported, a transaction must hold exclusive locks on all data items that it may update at some time during the execution of the transaction, thereby potentially reducing the level of concurrency in the system. For the same reason, some systems also permit a transaction to issue an exclusive lock and then later to downgrade the lock to a shared lock. TRANSACTION MANAGEMENT Two phase locking (2 PL) A transaction follows the two-phase locking protocol if all locking operations precede the first unlock operation in the transaction. 2PL According to the rules of this protocol, every transaction can be divided into two phases: first a growing phase, in which it acquires all the locks needed but cannot release any locks, and then a shrinking phase, in which it releases its locks but cannot acquire any new locks. There is no requirement that all locks be obtained simultaneously. Normally, the transaction acquires some locks, does some processing, and goes on to acquire additional locks as needed. However, it never releases any lock until it has reached a stage where no new locks are needed. TRANSACTION MANAGEMENT The rules are: • A transaction must acquire a lock on an item before operating on the item. The lock may be read or write, depending on the type of access needed. • Once the transaction releases a lock, it can never acquire any new locks. If upgrading of locks is allowed, upgrading can take place only during the growing phase and may require that the transaction wait until another transaction releases a shared lock on the item. Downgrading can take place only during the shrinking phase. TRANSACTION MANAGEMENT A single transaction leads to a series of rollbacks, is called cascading rollback. Cascading rollbacks are undesirable, because they potentially lead to the undoing of a significant amount of work. Clearly, it would be useful if we could design protocols that prevent cascading rollbacks. One way to achieve this with two-phase locking is to leave the release of all locks until the end of the transaction, as in the previous examples. With rigorous 2PL, transactions can be serialized in the order in which they commit. Another variant of 2PL, called strict 2PL, holds only exclusive locks until the end of the transaction. Most database systems implement one of these two variants of 2PL. TRANSACTION MANAGEMENT Another problem with two-phase locking, which applies to all locking-based schemes, is that it can cause deadlock, as transactions can wait for locks on data items. If two transactions wait for locks on items held by the other, deadlock will occur and the deadlock detection and recovery scheme described in the Section 22.2.4 will be needed. It is also possible for transactions to be in livelock, that is, left in a wait state indefinitely, unable to acquire any new locks, although the DBMS is not in deadlock. TRANSACTION MANAGEMENT Concurrency control with index structures Concurrency control for an index structure (see Appendix F) can be managed by treating each page of the index as a data item and applying the two-phase locking protocol described previously. However, because indexes are likely to be frequently accessed—particularly the higher levels of trees (as searching occurs from the root downwards)—this simple concurrency control strategy may lead to high lock contention. TRANSACTION MANAGEMENT Based on these observations, we can derive the following locking strategy: • For searches, obtain shared locks on nodes starting at the root and proceeding down wards along the required path. Release the lock on a (parent) node once a lock has been obtained on the child node. • For insertions, a conservative approach would be to obtain exclusive locks on all nodes as we descend the tree to the leaf node to be modified. This ensures that a split in the leaf node can propagate all the way up the tree to the root. However, if a child node is not full, the lock on the parent node can be released. A more optimistic approach would be to obtain shared locks on all nodes as we descend to the leaf node to be modified, where we obtain an exclusive lock on the leaf node itself. If the leaf node has to split, we upgrade the shared lock on the parent node to an exclusive lock. If this node also has to split, we continue to upgrade the locks at the next higher level. TRANSACTION MANAGEMENT Based on these observations, we can derive the following locking strategy: In the majority of cases, a split is not required, making this a better approach. The technique of locking a child node and releasing the lock on the parent node if possible is known as lock-coupling or crabbing. For further details on the performance of concurrency control algorithms for trees, the interested reader is referred to Srinivasan and Carey (1991). The technique of locking a child node and releasing the lock on the parent node if possible is known as lock-coupling or crabbing. Latches DBMSs also support another type of lock called a latch, which is held for a much shorter duration than a normal lock. A latch can be used before a page is read from, or written to, disk to ensure that the operation is atomic. For example, a latch would be obtained to write a page from the database buffers to disk, the page would then be written to disk, and the latch immediately unset. As the latch is simply to prevent conflict for this type of access, latches do not need to conform to the normal concurrency control protocol such as two-phase locking. TRANSACTION MANAGEMENT TRANSACTION MANAGEMENT Deadlock An impasse that may result when two (or more) transactions are each waiting for locks to be released that are held by the other. Figure 22.19 shows two transactions, T17 and T18, that are deadlocked because each is waiting for the other to release a lock on an item it holds. At time t2, transaction T17 requests and obtains an exclusive lock on item balx and at time t3 transaction T18 obtains an exclusive lock on item baly. Then at t6, T17 requests an exclusive lock on item baly. Because T18 holds a lock on baly, transaction T17 waits. Meanwhile, at time t7, T18 requests a lock on item balx, which is held by transaction T17. Neither transaction can continue, because each is waiting for a lock it cannot obtain until the other completes. Once deadlock occurs, the applications involved cannot resolve the problem. Instead, the DBMS has to recognize that deadlock exists and break the deadlock in some way. Unfortunately, there is only one way to break deadlock: abort one or more of the transactions. This usually involves undoing all the changes made by the aborted transaction(s). TRANSACTION MANAGEMENT Timeouts A simple approach to deadlock prevention is based on lock timeouts. In this approach, a transaction that requests a lock will wait for only a system-defined period of time. If the lock has not been granted within this period, the lock request times out. In this case, the DBMS assumes that the transaction may be deadlocked, even though it may not be, and it aborts and automatically restarts the transaction. This is a very simple and practical solution to deadlock prevention that is used by several commercial DBMSs. TRANSACTION MANAGEMENT Deadlock prevention Another possible approach to deadlock prevention is to order transactions using transaction timestamps. Two algorithms have been proposed by Rosenkrantz et al. (1978). One algorithm, Wait-Die, allows only an older transaction to wait for a younger one, otherwise the transaction is aborted (dies) and restarted with the same timestamp, so that eventually it will become the oldest active transaction and will not die. The second algorithm, Wound-Wait, uses a symmetrical approach: only a younger transaction can wait for an older one. If an older transaction requests a lock held by a younger one, the younger one is aborted (wounded). A variant of 2PL, called conservative 2PL, can also be used to prevent . TRANSACTION MANAGEMENT Deadlock detection Deadlock detection is usually handled by the construction of a wait-for graph (WFG) that shows the transaction dependencies; that is, transaction Ti is dependent on Tj if transaction Tj holds the lock on a data item that Ti is waiting for. The WFG is a directed graph G 5 (N, E) that consists of a set of nodes N and a set of directed edges E, which is constructed as follows: Create a node for each transaction. • Create a directed edge Ti Tj, if transaction Ti is waiting to lock an item that is currently locked by Tj. Deadlock exists if and only the WFG contains a cycle. TRANSACTION MANAGEMENT Frequency of deadlock detection Because a cycle in the wait-for graph is a necessary and sufficient condition for deadlock to exist, the deadlock detection algorithm generates the WFG at regular intervals and examines it for a cycle. The choice of time interval between executions of the algorithm is important. If the interval chosen is too small, deadlock detection will add considerable overhead; if the interval is too large, deadlock may not be detected for a long period. Alternatively, a dynamic deadlock detection algorithm could start with an initial interval size. Each time no deadlock is detected, the detection interval could be increased, for example, to twice the previous interval, and each time deadlock is detected, the interval could be reduced, for example, to half the previous interval, subject to some upper and lower limits. TRANSACTION MANAGEMENT Recovery from deadlock detection As we mentioned previously, once deadlock has been detected the DBMS needs to abort one or more of the transactions. There are several issues that need to be considered: (1) Choice of deadlock victim. In some circumstances, the choice of transactions to abort may be obvious. However, in other situations, the choice may not be so clear. In such cases, we would want to abort the transactions that incur the minimum costs. TRANSACTION MANAGEMENT This may take into consideration: (a) how long the transaction has been running (it may be better to abort a transaction that has just started rather than one that has been running for some time); (b) how many data items have been updated by the transaction (it would be better to abort a transaction that has made little change to the database rather than one that has made significant changes to the database); (c) how many data items the transaction is still to update (it would be better to abort a transaction that has many changes still to make to the database rather than one that has few changes to make). Unfortunately, this may not be something that the DBMS would necessarily know. TRANSACTION MANAGEMENT (2) How far to roll a transaction back. Having decided to abort a particular transaction, we have to decide how far to roll the transaction back. Clearly, undoing all the changes made by a transaction is the simplest solution, although not necessarily the most efficient. It may be possible to resolve the deadlock by rolling back only part of the transaction. (3) Avoiding starvation. Starvation occurs when the same transaction is always chosen as the victim, and the transaction can never complete. Starvation is very similar to livelock, which occurs when the concurrency control protocol never selects a particular transaction that is waiting for a lock. The DBMS can avoid starvation by storing a count of the number of times a transaction has been selected as the victim and using a different selection criterion once this count reaches some upper limit. TRANSACTION MANAGEMENT Timestamping Methods The use of locks, combined with the two-phase locking protocol, guarantees serializability of schedules. The order of transactions in the equivalent serial schedule is based on the order in which the transactions lock the items they require. If a transaction needs an item that is already locked, it may be forced to wait until the item is released. A different approach that also guarantees serializability uses transaction timestamps to order transaction execution for an equivalent serial schedule. Timestamp methods for concurrency control are quite different from locking methods. No locks are involved and therefore there can be no deadlock. Locking methods generally prevent conflicts by making transactions wait. With timestamp methods, there TRANSACTION MANAGEMENT A unique identifier created by the DBMS that indicates the relative starting time of a transaction. Timestamp A concurrency control protocol that orders transactions in such a way that older transactions, transactions with smaller timestamps, get priority in the event of conflict. Timestamping With timestamping, if a transaction attempts to read or write a data item, then the read or write is only allowed to proceed if the last update on that data item was carried out by an older transaction. Otherwise, the transaction requesting the read/ write is restarted and given a new timestamp. New timestamps must be assigned to restarted transactions to prevent their being continually aborted and restarted. Without new timestamps, a transaction with an old timestamp might not be able to commit owing to younger transactions having already committed. In addition to timestamps for transactions, there are timestamps for data items. Each data item contains a read_timestamp, giving the timestamp of the last transaction to read the item, and a write_timestamp, giving the timestamp of the last transaction to write (update) the item. TRANSACTION MANAGEMENT For a transaction T with timestamp ts(T), the timestamp ordering protocol works as follows: (1) Transaction T issues a read(x). (a) Transaction T asks to read an item (x) that has already been updated by a younger (later) transaction, that is, ts(T), write_timestamp(x). This means that an earlier transaction is trying to read a value of an item that has been updated by a later transaction. The earlier transaction is too late to read the previous outdated value, and any other values it has acquired are likely to be inconsistent with the updated value of the data item. In this situation, transaction T must be aborted and restarted with a new (later) timestamp. (b) Otherwise, ts(T) $write_timestamp(x), and the read operation can proceed. We set read_timestamp(x) 5 max(ts(T), read_timestamp(x)). (. TRANSACTION MANAGEMENT Transaction T issues a write(x). (a) Transaction T asks to write an item (x) whose value has already been read by a younger transaction, that is ts(T), read_timestamp(x). This means that a later transaction is already using the current value of the item and it would be an error to update it now. This occurs when a transaction is late in doing a write and a younger transaction has already read the old value or written a new one. In this case, the only solution is to roll back transaction T and restart it using a later timestamp. (b) Transaction T asks to write an item (x) whose value has already been written by a younger transaction, that is ts(T), write_timestamp(x). This means that transaction T is attempting to write an obsolete value of data item x. Transaction T should be rolled back and restarted using a later timestamp. (c) Otherwise, the write operation can proceed. We set write_timestamp(x) 5 ts(T). This scheme, called basic timestamp ordering, guarantees that transactions are TRANSACTION MANAGEMENT This scheme, called basic timestamp ordering, guarantees that transactions are conflict serializable, and the results are equivalent to a serial schedule in which the transactions are executed in chronological order of the timestamps. TRANSACTION MANAGEMENT Optimistic Techniques In some environments, conflicts between transactions are rare and the additional processing required by locking or timestamping protocols is unnecessary for many of the transactions. Optimistic techniques are based on the assumption that conflict is rare and that it is more efficient to allow transactions to proceed without imposing delays to ensure serializability (Kung and Robinson, 1981). Granularity All the concurrency control protocols that we have discussed assume that the database consists of a number of “data items,” without explicitly defining the term. Typically, a data item is chosen to be one of the following, ranging from coarse to fine, where fine granularity refers to small item sizes and coarse granularity refers to large item sizes: • the entire database; • a file; TRANSACTION MANAGEMENT Multiple-granularity locking To reduce the searching involved in locating locks on descendants, the DBMS can use another specialized locking strategy called multiple-granularity locking. This strategy uses a new type of lock called an intention lock (Gray et al., 1975). When any node is locked, an intention lock is placed on all the ancestors of the node. Database recovery The process of restoring the database to a correct state in the event of a failure. TRANSACTION MANAGEMENT There are many different types of failure that can affect database processing, each of which has to be dealt with in a different manner. Some failures affect main memory only, while others involve nonvolatile (secondary) storage. Among the causes of failure are: • system crashes due to hardware or software errors, resulting in loss of main memory; • media failures, such as head crashes or unreadable media, resulting in the loss of parts of secondary storage; • application software errors, such as logical errors in the program that is accessing the database, that cause one or more transactions to fail; • natural physical disasters, such as fires, floods, earthquakes, or power failures; • carelessness or unintentional destruction of data or facilities by operators or users; • sabotage, or intentional corruption or destruction of data, hardware, or software facilities. TRANSACTION MANAGEMENT Buffer management The management of the database buffers plays an important role in the recovery process and we briefly discuss their management before proceeding. As we mentioned at the start of this chapter, the buffer manager is responsible for the efficient management of the database buffers that are used to transfer pages to and from secondary storage. This involves reading pages from disk into the buffers until the buffers become full and then using a replacement strategy to decide which buffer(s) to force-write to disk to make space for new pages that need to be read from disk. Example replacement strategies are first-in-first-out (FIFO) and least recently used (LRU). In addition, the buffer manager should not read a page from disk if it is already in a database buffer. TRANSACTION MANAGEMENT Backup mechanism The DBMS should provide a mechanism to allow backup copies of the database and the log file (discussed next) to be made at regular intervals without necessarily having to stop the system first. The backup copy of the database can be used in the event that the database has been damaged or destroyed. Checkpointing The information in the log file is used to recover from a database failure. One difficulty with this scheme is that when a failure occurs we may not know how far back in the log to search and we may end up redoing transactions that have been safely written to the database. To limit the amount of searching and subsequent processing that we need to carry out on the log file, we can use a technique called checkpointing. TRANSACTION MANAGEMENT Recovery techniques using deferred update Using the deferred update recovery protocol, updates are not written to the database until after a transaction has reached its commit point. If a transaction fails before it reaches this point, it will not have modified the database and so no undoing of changes will be necessary. Recovery techniques using immediate update Using the immediate update recovery protocol, updates are applied to the database as they occur without waiting to reach the commit point. As well as having to redo the updates of committed transactions following a failure, it may now be necessary to undo the effects of transactions that had not committed at the time of failure. TRANSACTION MANAGEMENT Shadow paging An alternative to the log-based recovery schemes described above is shadow paging (Lorie, 1977). This scheme maintains two-page tables during the life of a transaction: a current page table and a shadow page table. When the transaction starts, the two-page tables are the same. The shadow page table is never changed thereafter, and is used to restore the database in the event of a system failure. In a DDBMS, distributed transactions (transactions that access data at more than one site) are divided into a number of subtransactions, one for each site that has to be accessed. In such a system, atomicity has to be maintained for both the subtransactions and the overall (global) transaction. TRANSACTION MANAGEMENT A transaction is viewed as a collection of related subtasks, or subtransactions, each of which may also contain any number of subtransactions. Nested transaction model The nested transaction model was introduced by Moss (1981). In this model, the complete transaction forms a tree, or hierarchy, of subtransactions. There is a top-level transaction that can have a number of child transactions; each child transaction can also have nested transactions. Emulating nested transactions using savepoints An identifiable point in a flat transaction representing some partially consistent state, which can be used as an internal restart point for the transaction if a subsequent problem is detected. Savepoint One of the objectives of the nested transaction model is to provide a unit TRANSACTION MANAGEMENT The nested transaction model presented in Section 22.4.1 requires the commit process to occur in a bottom-up fashion through the top-level transaction. This is called, more precisely, a closed nested transaction, as the semantics of these transactions enforce atomicity at the top level. In contrast, we also have open nested transactions, which relax this condition and allow the partial results of subtransactions to be observed outside the transaction. The saga model discussed in the previous section is an example of an open nested transaction. Locks Implicit locking occurs for all SQL statements so that a user never needs to lock any resource explicitly, although Oracle does provide a mechanism to allow the user to acquire locks manually or to alter the default locking behavior. The default locking mechanisms lock data at the lowest level of restrictiveness to guarantee integrity while allowing the highest degree of concurrency. Whereas many DBMSs store information on row locks as a list in memory, Oracle stores row-locking information within the actual data block where the row is stored.