deadlock.pdf

Deadlock is more serious than indefinite postponement or starvation because it affects more than one job. Because resources are being tied up, the entire system (not just a few programs) is affected. The example most often used to illustrate deadlock is a traffic jam. Deadlock Deadlock As shown in Figure 5.1, there’s no simple and immediate solution to a deadlock; no one can move forward until someone moves out of the way, but no one can move out of the way until either someone advances or the rear of a line moves back. Obviously it requires outside intervention to remove one of the four vehicles from an intersection or to make a line move back. Only then can the deadlock be resolved. Deadlocks became prevalent with the introduction of interactive systems, which gen-erally improve the use of resources through dynamic resource sharing, but this capa-bility also increases the possibility of deadlocks. (figure 5.1) A classic case of traffic deadlock on four one-way streets. This is “gridlock,” where no vehicles can move forward to clear the traffic jam. 141 Chapter 5 | Process Management In some computer systems, deadlocks are regarded as a mere inconvenience that causes delays. But for real-time systems, deadlocks cause critical situations. For example, a deadlock in a hospital’s life support system or in the guidance system aboard an aircraft could endanger lives. Regardless of the environment, the operating system must either prevent deadlocks or resolve them when they happen. In Chapter 12, we’ll learn how to calculate system reliability and availability, which can be affected by processor conflicts. Seven Cases of Deadlock A deadlock usually occurs when nonsharable, nonpreemptable resources, such as files, printers, or scanners, are allocated to jobs that eventually require other nonsharable, nonpreemptive resources—resources that have been locked by other jobs. However, deadlocks aren’t restricted to files, printers, and scanners. They can also occur on sharable resources that are locked, such as disks and databases. Directed graphs visually represent the system’s resources and processes, and show how they are deadlocked. Using a series of squares (for resources) and circles (for processes), and connectors with arrows (for requests), directed graphs can be manipu-lated to understand how deadlocks occur. Case 1: Deadlocks on File Requests If jobs are allowed to request and hold files for the duration of their execution, a deadlock can occur as the simplified directed graph shown in Figure 5.2 graphically illustrates. (figure 5.2) Inventory File (F1) Case 1. These two processes, shown as circles, are each waiting for a resource, shown as Purchasing (P1) Sales (P2) rectangles, that has already been allocated to the other process, thus Supplier File (F2) 142 creating a deadlock. 1. Purchasing (P1) accesses the supplier file (F2) to place an order for more lumber. Seven Cases of Deadlock For example, consider the case of a home construction company with two applica-tion programs, purchasing (P1) and sales (P2), which are active at the same time. Both need to access two files, inventory (F1) and suppliers (F2), to read and write transactions. One day the system deadlocks when the following sequence of events takes place: 2. Sales (P2) accesses the inventory file (F1) to reserve the parts that will be required to build the home ordered that day. 3. Purchasing (P1) doesn’t release the supplier file (F2) but requests the inventory file (F1) to verify the quantity of lumber on hand before placing its order for more, but P1 is blocked because F1 is being held by P2. 4. Meanwhile, sales (P2) doesn’t release the inventory file (F1) but requests the supplier file (F2) to check the schedule of a subcontractor. At this point, P2 is also blocked because F2 is being held by P1. Any other programs that require F1 or F2 will be put on hold as long as this situation continues. This deadlock will remain until one of the two programs is closed or forcibly removed and its file is released. Only then can the other program continue and the system return to normal. Case 2: Deadlocks in Databases A deadlock can also occur if two processes access and lock records in a database. To appreciate the following scenario, remember that database queries and transac-tions are often relatively brief processes that either search or modify parts of a database. Requests usually arrive at random and may be interleaved arbitrarily. Locking is a technique used to guarantee the integrity of the data through which the user locks out all other users while working with the database. Locking can be done at three different levels: the entire database can be locked for the duration of the request; a subsection of the database can be locked; or only the individual record can be locked until the process is completed. Locking the entire database (the most extreme and most successful solution) prevents a deadlock from occurring but it restricts access to the database to one user at a time and, in a multiuser environment, response times are significantly slowed; this is normally an unacceptable solution. When the locking is performed on only one part of the database, access time is improved but the possi-bility of a deadlock is increased because different processes sometimes need to work with several parts of the database at the same time. 143 Chapter 5 | Process Management Here’s a system that locks each record when it is accessed until the process is completed. There are two processes (P1 and P2), each of which needs to update two records (R1 and R2), and the following sequence leads to a deadlock: 1. P1 accesses R1 and locks it. 2. P2 accesses R2 and locks it. 3. P1 requests R2, which is locked by P2. 4. P2 requests R1, which is locked by P1. An alternative, of course, is to avoid the use of locks—but that leads to other difficul-ties. If locks are not used to preserve their integrity, the updated records in the data-base might include only some of the data—and their contents would depend on the order in which each process finishes its execution. This is known as a race between processes and is illustrated in the following example and Figure 5.3. (figure 5.3) Case 2. P1 finishes first and wins the race but its . version of the record will soon be overwritten by P2. Regardless of which process wins the race, the final version of the data will be incorrect. . . ✔ A race introduces Let’s say you are a student of a university that maintains most of its files on a database that can be accessed by several different programs, including one for grades and another listing home addresses. You’ve just moved so you send the university a change of address form at the end of the fall term, shortly after grades are submitted. And one fateful day, both programs race to access your record in the database: 1. The grades process (P1) is the first to access your record (R1), and it copies the record to its work area. 2. The address process (P2) accesses your record (R1) and copies it to its work area. 144 the element of chance, an element that’s totally unacceptable in database management. The integrity of the database must be upheld. 4. P2 changes your record (R1) by updating the address field. 5. P1 finishes its work first and rewrites its version of your record back to the database. Your grades have been updated, but your address hasn’t. 6. P2 finishes and rewrites its updated record back to the database. Your address has been changed, but your grades haven’t. According to the database, you didn’t attend school this term. Seven Cases of Deadlock 3. P1 changes your student record (R1) by entering your grades for the fall term and calculating your new grade average. If we reverse the order and say that P2 won the race, your grades will be updated but not your address. Depending on your success in the classroom, you might prefer one mishap over the other; but from the operating system’s point of view, either alternative is unacceptable because incorrect data is allowed to corrupt the database. The system can’t allow the integrity of the database to depend on a random sequence of events. Case 3: Deadlocks in Dedicated Device Allocation The use of a group of dedicated devices, such as a cluster of DVD read/write drives, can also deadlock the system. Let’s say two users from the local board of education are each running a program (P1 and P2), and both programs will eventually need two DVD drivers to copy files from one disc to another. The system is small, however, and when the two programs are begun, only two DVD-R drives are available and they’re allocated on an “as requested” basis. Soon the following sequence transpires: 1. P1 requests drive 1 and gets it. 2. P2 requests drive 2 and gets it. 3. P1 requests drive 2 but is blocked. 4. P2 requests drive 1 but is blocked. Neither job can continue because each is waiting for the other to finish and release its drive—an event that will never occur. A similar series of events could deadlock any group of dedicated devices. Case 4: Deadlocks in Multiple Device Allocation Deadlocks aren’t restricted to processes contending for the same type of device; they can happen when several processes request, and hold on to, several dedicated devices while other processes act in a similar manner as shown in Figure 5.4. 145 Chapter 5 | Process Management (figure 5.4) Case 4. Three processes, shown as circles, are each waiting for a device that has already been allocated to another process, thus creating a deadlock. Consider the case of an engineering design firm with three programs (P1, P2, and P3) and three dedicated devices: scanner, printer, and plotter. The following sequence of events will result in deadlock: 1. P1 requests and gets the scanner. 2. P2 requests and gets the printer. 3. P3 requests and gets the plotter. 4. P1 requests the printer but is blocked. 5. P2 requests the plotter but is blocked. 6. P3 requests the scanner but is blocked. As in the earlier examples, none of the jobs can continue because each is waiting for a resource being held by another. Case 5: Deadlocks in Spooling Although in the previous example the printer was a dedicated device, printers are usu-ally sharable devices, called virtual devices, that use high-speed storage to transfer data between it and the CPU. The spooler accepts output from several users and acts as a temporary storage area for all output until the printer is ready to accept it. This process is called spooling. If the printer needs all of a job’s output before it will begin printing, but the spooling system fills the available space with only partially completed output, then a deadlock can occur. It happens like this. Let’s say it’s one hour before the big project is due for a computer class. Twenty-six frantic programmers key in their final changes and, with only minutes to spare, all issue print commands. The spooler receives the pages one at a time from each of the students but the pages are received separately, several page ones, page twos, etc. The printer is ready to print the first completed document it gets, but as the spooler canvasses its files it has the first page for many programs but the last page for none of them. Alas, the 146 This scenario isn’t limited to printers. Any part of the system that relies on spooling, such as one that handles incoming jobs or transfers files over a network, is vulnerable to such a deadlock. Seven Cases of Deadlock spooler is full of partially completed output so no other pages can be accepted, but none of the jobs can be printed out (which would release their disk space) because the printer only accepts completed output files. It’s an unfortunate state of affairs. Case 6: Deadlocks in a Network A network that’s congested or has filled a large percentage of its I/O buffer space can become deadlocked if it doesn’t have protocols to control the flow of messages through the network as shown in Figure 5.5. (figure 5.5) Case 6, deadlocked C6 C5 network flow. Notice that only two nodes, C1 and C2, have buffers. Each circle represents a node and each line represents a C7 C4 C3 communication path. The arrows indicate the direction of data flow. C1 with buffer DEADLOCKED C2 with buffer For example, a medium-sized word-processing center has seven computers on a network, each on different nodes. C1 receives messages from nodes C2, C6, and C7 and sends messages to only one: C2. C2 receives messages from nodes C1, C3, and C4 and sends messages to only C1 and C3. The direction of the arrows in Figure 5.5 indicates the flow of messages. Messages received by C1 from C6 and C7 and destined for C2 are buffered in an output queue. Messages received by C2 from C3 and C4 and destined for C1 are buffered in an output queue. As the traffic increases, the length of each output queue increases until all of the available buffer space is filled. At this point C1 can’t accept any more messages (from C2 or any other computer) because there’s no more buffer space available to store them. For the same reason, C2 can’t accept any messages from C1 or any other computer, not even a request to send. The communication path between C1 and C2 becomes deadlocked; and because C1 can’t send messages to any other computer except C2 and can only receive messages from C6 and C7, those 147 Chapter 5 | Process Management routes also become deadlocked. C1 can’t send word to C2 about the problem and so the deadlock can’t be resolved without outside intervention. Case 7: Deadlocks in Disk Sharing Disks are designed to be shared, so it’s not uncommon for two processes to be accessing different areas of the same disk. This ability to share creates an active type of deadlock, known as livelock. Processes use a form of busy-waiting that’s different from a natural wait. In this case, it’s waiting to share a resource but never actually gains control of it. In Figure 5.6, two competing processes are sending conflicting commands, causing livelock. Notice that neither process is blocked, which would cause a deadlock. Instead, each is active but never reaches fulfillment. (figure 5.6) Case 7. Two processes are I/O Channel each waiting for an I/O request to be filled: one at track 20 and one at track P1 Read record at track 20 20 310 P2 Write to file at track 310 Disk Control Unit 310. But by the time the read/write arm reaches one track, a competing com- Disk Main Memory mand for the other track has been issued, so neither command is satisfied and livelock occurs. For example, at an insurance company the system performs many daily transactions. One day the following series of events ties up the system: 1. Customer Service (P1) wishes to show a payment so it issues a command to read the balance, which is stored on track 20 of a disk. 2. While the control unit is moving the arm to track 20, P1 is put on hold and the I/O channel is free to process the next I/O request. 3. While the arm is moving into position, Accounts Payable (P2) gains control of the I/O channel and issues a command to write someone else’s payment to a record stored on track 310. If the command is not “locked out,” P2 will be put on hold while the control unit moves the arm to track 310. 4. Because P2 is “on hold” while the arm is moving, the channel can be captured again by P1, which reconfirms its command to “read from track 20.” 5. Because the last command from P2 had forced the arm mechanism to track 310, the disk control unit begins to reposition the arm to track 20 to satisfy P1. The I/O channel would be released because P1 is once again put on hold, so it could be captured by P2, which issues a WRITE command only to discover that the arm mechanism needs to be repositioned. 148 Conditions for Deadlock Conditions for Deadlock As a result, the arm is in a constant state of motion, moving back and forth between tracks 20 and 310 as it responds to the two competing commands, but satisfies neither. In each of these seven cases, the deadlock (or livelock) involved the interaction of several processes and resources, but each deadlock was preceded by the simultaneous occurrence of four conditions that the operating system (or other systems) could have recognized: mutual exclusion, resource holding, no preemption, and circular wait. It’s important to remember that each of these four conditions is necessary for the operating system to work smoothly. None of them can be removed easily without causing the system’s overall functioning to suffer. Therefore, the system needs to recognize the combination of conditions before they occur and threaten to cause the system to lock up. ✔ When a deadlock occurs, all four conditions are present, though the opposite is not true—the presence of all four conditions does not always lead to deadlock. To illustrate these four conditions, let’s revisit the staircase example from the beginning of the chapter to identify the four conditions required for a deadlock. When two people meet between landings, they can’t pass because the steps can hold only one person at a time. Mutual exclusion, the act of allowing only one person (or process) to have access to a step (a dedicated resource), is the first condition for deadlock. When two people meet on the stairs and each one holds ground and waits for the other to retreat, that is an example of resource holding (as opposed to resource shar-ing), the second condition for deadlock. In this example, each step is dedicated to the climber (or the descender); it is allocated to the holder for as long as needed. This is called no preemption, the lack of temporary reallocation of resources, and is the third condition for deadlock. These three lead to the fourth condition of circular wait in which each person (or process) involved in the impasse is waiting for another to voluntarily release the step (or resource) so that at least one will be able to continue on and eventually arrive at the destination. All four conditions are required for the deadlock to occur, and as long as all four conditions are present the deadlock will continue; but if one condition can be removed, the deadlock will be resolved. In fact, if the four conditions can be prevented from ever occurring at the same time, deadlocks can be prevented. Although this con-cept is obvious, it isn’t easy to implement. 149 Chapter 5 | Process Management Modeling Deadlocks Holt showed how the four conditions can be modeled using directed graphs. (We used modified directed graphs in Figure 5.2 and Figure 5.4.) These graphs use two kinds of symbols: processes represented by circles and resources represented by squares. A solid arrow from a resource to a process, shown in Figure 5.7(a), means that the process is holding that resource. A dashed line with an arrow from a process to a resource, shown in Figure 5.7(b), means that the process is waiting for that resource. The direction of the arrow indicates the flow. If there’s a cycle in the graph then there’s a deadlock involving the processes and the resources in the cycle. (figure 5.7) Resource 1 Resource 1 In (a), Resource 1 is being held by Process 1 and Resource 2 is held by Process 1 Process 2 Process 1 Process 2 in a system that Process 2 is not deadlocked. In (b), Process 1 requests Resource 2 Resource 2 but doesn’t Resource 2 release Resource 1, and Process 2 does the same — (a) (b) creating a deadlock. (If one process released its resource, the deadlock would be The following system has three processes—P1, P2, P3—and three resources—R1, R2, resolved.) R3—each of a different type: printer, disk drive, and plotter. Because there is no specified order in which the requests are handled, we’ll look at three different possible scenarios using graphs to help us detect any deadlocks. Scenario 1 The first scenario’s sequence of events is shown in Table 5.1. The directed graph is shown in Figure 5.8. Event Action (table 5.1) 1 P1 requests and is allocated the printer R1. 2 P1 releases the printer R1. 3 P2 requests and is allocated the disk drive R2. 4 P2 releases the disk R2. 5 P3 requests and is allocated the plotter R3. 6 P3 releases the plotter R3. First scenario’s sequence of events is 150 shown in the directed graph in Figure 5.8. (figure 5.8) First scenario. The system Modeling Deadlocks Notice in the directed graph that there are no cycles. Therefore, we can safely conclude that a deadlock can’t occur even if each process requests every resource if the resources are released before the next process requests them. will stay free of deadlocks if each resource is released before it is requested by the next process. Scenario 2 Now, consider a second scenario’s sequence of events shown in Table 5.2. (table 5.2) The second scenario’s Event Action 1 P1 requests and is allocated R1. 2 P2 requests and is allocated R2. 3 P3 requests and is allocated R3. 4 P1 requests R2. 5 P2 requests R3. 6 P3 requests R1. sequence of events is shown in the two directed graphs shown in Figure 5.9. The progression of the directed graph is shown in Figure 5.9. A deadlock occurs because every process is waiting for a resource that is being held by another process, but none will be released without intervention. (figure 5.9) Second scenario. The system (a) becomes deadlocked (b) when P3 requests R1. Notice the circular wait. 6 R1 R2 1 2 4 P1 R3 R1 3 1 5 P2 (a) R2 2 4 P3 P1 R3 3 5 P2 P3 (b) 151 Chapter 5 | Process Management Scenario 3 The third scenario is shown in Table 5.3. As shown in Figure 5.10, the resources are released before deadlock can occur. Event Action (table 5.3) 1 P1 requests and is allocated R1. The third scenario’s 2 P1 requests and is allocated R2. 3 P2 requests R1. 4 P3 requests and is allocated R3. 5 P1 releases R1, which is allocated to P2. 6 P3 requests R2. 7 P1 releases R2, which is allocated to P3. sequence of events is shown in the directed graph in Figure 5.10. 3 (figure 5.10) 5 The third scenario. After R1 R2 R3 R1 R2 R3 event 4, the directed graph looks like (a) and P2 is blocked because P1 1 4 4 2 2 is holding on to R1. However, event 5 breaks P1 P2 P3 P1 P2 P3 the deadlock and the graph soon looks like (b). 6 Again there is a blocked process, P3, which must (a) (b) wait for the release of R2 in event 7 when the graph looks like (c). 5 R1 R2 R3 P2 P3 4 P1 7 (c) 152 The examples presented so far have examined cases in which one or more resources of different types were allocated to a process. However, the graphs can be expanded to include several resources of the same type, such as tape drives, which can be allocated individually or in groups to the same process. These graphs cluster the devices of the same type into one entity, shown in Figure 5.11 as a rectangle, and the arrows show the links between the single resource and the processes using it. Figure 5.11 gives an example of a cluster with three resources of the same type, such as three disk drives, each allocated to a different process. Although Figure 5.11(a) seems to be stable (no deadlock can occur), this is not the case because if all three processes request one more resource without releasing the one they are using, then deadlock will occur as shown in Figure 5.11(b). Strategies for Handling Deadlocks Another Example (figure 5.11) (a): A fully allocated cluster of resources. There are as many lines coming 1 2 3 out of it as there are resources, units, in it. The state of (a) is uncertain P1 P2 P3 P1 P2 P3 because a request for another unit by all three processes would create a (a) (b) deadlock as shown in (b). Strategies for Handling Deadlocks As these examples show, the requests and releases are received in an unpredictable order, which makes it very difficult to design a foolproof preventive policy. In general, operating systems use one of three strategies to deal with deadlocks: • Prevent one of the four conditions from occurring (prevention). • Avoid the deadlock if it becomes probable (avoidance). • Detect the deadlock when it occurs and recover from it gracefully (detection). Prevention To prevent a deadlock, the operating system must eliminate one of the four necessary conditions, a task complicated by the fact that the same condition can’t be eliminated from every resource. 153 Chapter 5 | Process Management Mutual exclusion is necessary in any computer system because some resources such as memory, CPU, and dedicated devices must be exclusively allocated to one user at a time. In the case of I/O devices, such as printers, the mutual exclusion may be bypassed by spooling, which allows the output from many jobs to be stored in separate temporary spool files at the same time, and each complete output file is then selected for printing when the device is ready. However, we may be trading one type of deadlock (Case 3: Deadlocks in Dedicated Device Allocation) for another (Case 5: Deadlocks in Spooling). Resource holding, where a job holds on to one resource while waiting for another one that’s not yet available, could be sidestepped by forcing each job to request, at creation time, every resource it will need to run to completion. For example, if every job in a batch system is given as much memory as it needs, then the number of active jobs will be dictated by how many can fit in memory—a policy that would significantly decrease the degree of multiprogramming. In addition, peripheral devices would be idle because they would be allocated to a job even though they wouldn’t be used all the time. As we’ve said before, this was used successfully in batch environments although it reduced the effective use of resources and restricted the amount of multiprogramming. But it doesn’t work as well in interactive systems. No preemption could be bypassed by allowing the operating system to deallocate resources from jobs. This can be done if the state of the job can be easily saved and restored, as when a job is preempted in a round robin environment or a page is swapped to secondary storage in a virtual memory system. On the other hand, preemption of a dedicated I/O device (printer, plotter, tape drive, and so on), or of files during the modi-fication process, can require some extremely unpleasant recovery tasks. Circular wait can be bypassed if the operating system prevents the formation of a circle. One such solution was proposed by Havender and is based on a numbering system for the resources such as: printer = 1, disk = 2, tape = 3, plotter = 4, and so on. The system forces each job to request its resources in ascending order: any “number one” devices required by the job would be requested first; any “number two” devices would be requested next; and so on. So if a job needed a printer and then a plotter, it would request them in this order: printer (#1) first and then the plotter (#4). If the job required the plotter first and then the printer, it would still request the printer first (which is a #1) even though it wouldn’t be used right away. A job could request a printer (#1) and then a disk (#2) and then a tape (#3); but if it needed another printer (#1) late in its processing, it would still have to anticipate that need when it requested the first one, and before it requested the disk. This scheme of “hierarchical ordering” removes the possibility of a circular wait and therefore guarantees the removal of deadlocks. It doesn’t require that jobs state their maximum needs in advance, but it does require that the jobs anticipate the order in 154 Avoidance Even if the operating system can’t remove one of the conditions for deadlock, it can avoid one if the system knows ahead of time the sequence of requests associated with each of the active processes. As was illustrated in the graphs presented in Figure 5.7 through Figure 5.11, there exists at least one allocation of resources sequence that will allow jobs to continue without becoming deadlocked. Strategies for Handling Deadlocks which they will request resources. From the perspective of a system designer, one of the difficulties of this scheme is discovering the best order for the resources so that the needs of the majority of the users are satisfied. Another difficulty is that of assigning a ranking to nonphysical resources such as files or locked database records where there is no basis for assigning a higher number to one over another. One such algorithm was proposed by Dijkstra in 1965 to regulate resource allocation to avoid deadlocks. The Banker’s Algorithm is based on a bank with a fixed amount of capital that operates on the following principles: • No customer will be granted a loan exceeding the bank’s total capital. • All customers will be given a maximum credit limit when opening an account. • No customer will be allowed to borrow over the limit. • The sum of all loans won’t exceed the bank’s total capital. ✔ To remain in a safe state, the bank has to have sufficient funds to satisfy the needs of at least one customer. Under these conditions, the bank isn’t required to have on hand the total of all maximum lending quotas before it can open up for business (we’ll assume the bank will always have the same fixed total and we’ll disregard interest charged on loans). For our example, the bank has a total capital fund of $10,000 and has three cus-tomers, C1, C2, and C3, who have maximum credit limits of $4,000, $5,000, and $8,000, respectively. Table 5.4 illustrates the state of affairs of the bank after some loans have been granted to C2 and C3. This is called a safe state because the bank still has enough money left to satisfy the maximum requests of C1, C2, or C3. (table 5.4) Customer Maximum Credit Remaining Credit 0 4,000 4,000 C2 2,000 5,000 3,000 C3 4,000 8,000 4,000 The bank started with $10,000 and has remaining capital of $4,000 after C1 these loans. Therefore, it’s in a “safe state.” Loan Amount Total loaned: $6,000 Total capital fund: $10,000 155 Management Chapter 5 | Process A few weeks later after more loans have been made, and some have been repaid, the bank is in the unsafe state represented in Table 5.5. (table 5.5) Customer Loan Amount Maximum Credit Remaining Credit C1 2,000 4,000 2,000 C2 3,000 5,000 2,000 $1,000 after these C3 4,000 8,000 4,000 loans and therefore is The bank only has remaining capital of in an “unsafe state.” Total loaned: $9,000 Total capital fund: $10,000 This is an unsafe state because with only $1,000 left, the bank can’t satisfy anyone’s maximum request; and if the bank lent the $1,000 to anyone, then it would be deadlocked (it can’t make a loan). An unsafe state doesn’t necessarily lead to deadlock, but it does indicate that the system is an excellent candidate for one. After all, none of the customers is required to request the maximum, but the bank doesn’t know the exact amount that will eventually be requested; and as long as the bank’s capital is less than the maximum amount available for individual loans, it can’t guarantee that it will be able to fill every loan request. If we substitute jobs for customers and dedicated devices for dollars, we can apply the same banking principles to an operating system. In this example the system has 10 devices. Table 5.6 shows our system in a safe state and Table 5.7 depicts the same system in an unsafe state. As before, a safe state is one in which at least one job can finish because there are enough available resources to satisfy its maximum needs. Then, using the resources released by the finished job, the maximum needs of another job can be filled and that job can be finished, and so on until all jobs are done. Job No. Devices Allocated Maximum Required Remaining Needs 1 0 4 4 2 2 5 3 3 4 8 4 (table 5.6) Resource assignments after initial allocations. A safe state: Six devices are allocated and four units are still available. Total number of devices allocated: 6 Total number of devices in system: 10 156 Resource assignments after later allocations. An unsafe state: Only one unit is available but every job requires at least two to Job No. Devices Allocated Maximum Required Remaining Needs 1 2 4 2 2 3 5 2 3 4 8 4 complete its execution. Total number of devices allocated: 9 Total number of devices in system: 10 The operating system must be sure never to satisfy a request that moves it from a safe state to an unsafe one. Therefore, as users’ requests are satisfied, the operating system must identify the job with the smallest number of remaining resources and make sure that the number of available resources is always equal to, or greater than, the number needed for this job to run to completion. Requests that would place the safe state in jeopardy must be blocked by the operating system until they can be safely accommodated. ✔ If the system is always kept in a safe state, all requests will eventually be satisfied and a deadlock will be avoided. If this elegant solution is expanded to work with several classes of resources, the system sets up a “resource assignment table” for each type of resource and tracks each table to keep the system in a safe state. Although the Banker’s Algorithm has been used to avoid deadlocks in systems with a few resources, it isn’t always practical for most systems for several reasons: • As they enter the system, jobs must predict the maximum number of resources needed. As we’ve said before, this isn’t practical in interactive systems. • The number of total resources for each class must remain constant. If a device breaks and becomes suddenly unavailable, the algorithm won’t work (the system may already be in an unsafe state). • The number of jobs must remain fixed, something that isn’t possible in interactive systems where the number of active jobs is constantly changing. • The overhead cost incurred by running the avoidance algorithm can be quite high when there are many active jobs and many devices because it has to be invoked for every request. • Resources aren’t well utilized because the algorithm assumes the worst case and, as a result, keeps vital resources unavailable to guard against unsafe states. • Scheduling suffers as a result of the poor utilization and jobs are kept waiting for resource allocation. A steady stream of jobs asking for a few resources can cause the indefinite postponement of a more complex job requiring many resources. Detection The directed graphs presented earlier in this chapter showed how the existence of a circular wait indicated a deadlock, so it’s reasonable to conclude that deadlocks can 157 Strategies for Handling Deadlocks (table 5.7) Chapter 5 | Process Management be detected by building directed resource graphs and looking for cycles. Unlike the avoidance algorithm, which must be performed every time there is a request, the algorithm used to detect circularity can be executed whenever it is appropriate: every hour, once a day, only when the operator notices that throughput has deteriorated, or when an angry user complains. The detection algorithm can be explained by using directed resource graphs and “reducing” them. Begin with a system that is in use, as shown in Figure 5.12(a). The steps to reduce a graph are these: 1. Find a process that is currently using a resource and not waiting for one. This process can be removed from the graph (by disconnecting the link tying the resource to the process, such as P3 in Figure 5.12(b)), and the resource can be returned to the “available list.” This is possible because the process would eventually finish and return the resource. 2. Find a process that’s waiting only for resource classes that aren’t fully allocated (such as P2 in Figure 5.12(c)). This process isn’t contributing to deadlock since it would eventually get the resource it’s waiting for, finish its work, and return the resource to the “available list” as shown in Figure 5.12(c).” 3. Go back to step 1 and continue with steps 1 and 2 until all lines connecting resources to processes have been removed, eventually reaching the stage shown in Figure 5.12(d). If there are any lines left, this indicates that the request of the process in question can’t be satisfied and that a deadlock exists. Figure 5.12 illustrates a system in which three processes— P1, P2, and P3—and three resources—R1, R2, and R3—aren’t deadlocked. (figure 5.12) R1 R2 R3 R1 R2 R3 This system is deadlock-free because the graph can be completely reduced, P1 P2 P3 P1 (a) P3 (b) R1 R2 R3 R1 R2 R3 P1 P2 P3 P1 P2 P3 (c) 158 P2 (d) as shown in (d). Strategies for Handling Deadlocks Figure 5.12 shows the stages of a graph reduction from (a), the original state. In (b), the link between P3 and R3 can be removed because P3 isn’t waiting for any other resources to finish, so R3 is released and allocated to P2 (step 1). In (c), the links between P2 and R3 and between P2 and R2 can be removed because P2 has all of its requested resources and can run to completion—and then R2 can be allocated to P1. Finally, in (d), the links between P1 and R2 and between P1 and R1 can be removed because P1 has all of its requested resources and can finish successfully. Therefore, the graph is completely resolved. However, Figure 5.13 shows a very similar situation that is deadlocked because of a key difference: P2 is linked to R1. (figure 5.13) Even after this graph (a) is R1 R2 R3 R1 R2 R3 P2 P3 P1 P2 P3 reduced as much as possible (by removing the request from P3), it is still deadlocked (b). P1 (a) (b) The deadlocked system in Figure 5.13 can’t be reduced. In (a), the link between P3 and R3 can be removed because P3 isn’t waiting for any other resource, so R3 is released and allocated to P2. But in (b), P2 has only two of the three resources it needs to finish and it is waiting for R1. But R1 can’t be released by P1 because P1 is waiting for R2, which is held by P2; moreover, P1 can’t finish because it is waiting for P2 to finish (and release R2), and P2 can’t finish because it’s waiting for R1. This is a circular wait. Recovery Once a deadlock has been detected, it must be untangled and the system returned to normal as quickly as possible. There are several recovery algorithms, but they all have one feature in common: They all require at least one victim, an expendable job, which, when removed from the deadlock, will free the system. Unfortunately for the victim, removal generally requires that the job be restarted from the beginning or from a convenient midpoint. 159 160 Chapter 5 | Process Management The first and simplest recovery method, and the most drastic, is to terminate every job that’s active in the system and restart them from the beginning. The second method is to terminate only the jobs involved in the deadlock and ask their users to resubmit them. The third method is to identify which jobs are involved in the deadlock and terminate them one at a time, checking to see if the deadlock is eliminated after each removal, until the deadlock has been resolved. Once the system is freed, the remaining jobs are allowed to complete their processing and later the halted jobs are started again from the beginning. The fourth method can be put into effect only if the job keeps a record, a snapshot, of its progress so it can be interrupted and then continued without starting again from the beginning of its execution. The snapshot is like the landing in our staircase example: Instead of forcing the deadlocked stair climbers to return to the bottom of the stairs, they need to retreat only to the nearest landing and wait until the others have passed. Then the climb can be resumed. In general, this method is favored for long-running jobs to help them make a speedy recovery. Until now we’ve offered solutions involving the jobs caught in the deadlock. The next two methods concentrate on the nondeadlocked jobs and the resources they hold. One of them, the fifth method in our list, selects a nondeadlocked job, preempts the resources it’s holding, and allocates them to a deadlocked process so it can resume execution, thus breaking the deadlock. The sixth method stops new jobs from entering the system, which allows the nondeadlocked jobs to run to completion so they’ll release their resources. Eventually, with fewer jobs in the system, competition for resources is curtailed so the deadlocked processes get the resources they need to run to completion. This method is the only one listed here that doesn’t rely on a victim, and it’s not guaranteed to work unless the number of available resources surpasses that needed by at least one of the deadlocked jobs to run (this is possible with multiple resources). Several factors must be considered to select the victim that will have the least-negative effect on the system. The most common are: • The priority of the job under consideration—high-priority jobs are usually untouched • CPU time used by the job—jobs close to completion are usually left alone • The number of other jobs that would be affected if this job were selected as the victim In addition, programs working with databases also deserve special treatment because a database that is only partially updated is only partially correct. Therefore, jobs that are modifying data shouldn’t be selected for termination because the consistency and valid-ity of the database would be jeopardized. Fortunately, designers of many database sys-tems have included sophisticated recovery mechanisms so damage to the database is minimized if a transaction is interrupted or terminated before completion.

Document Details

Tags

Related

Full Transcript

Upgrade to continue