Deadlock PDF

270 CHAPTER 6. INTERPROCESS COMMUNICATION Chapter 7 Deadlock We’ve seen the value of organizing a computation as a set of cooperating pro- cesses. However, there is one especially difficult problem that can sometimes arise. This is the problem of deadlock. What is deadlock? A deadlock occurs when there is a set of processes that are permanently blocked. The unblocking of one relies on the progress of another. But since none can make progress, none will be unblocked, and the deadlock condition will remain. 7.1 Resource Allocation Graphs Consider the resource allocation graph in Figure 7.1. We have two processes, A and B, indicated by circles, and two resources, X and Y, indicated by squares. If a process is holding a resource, meaning that the resource was 271 272 CHAPTER 7. DEADLOCK allocated to that process, we draw an arrow from the resource to the process. If a process is waiting for a resource, meaning that a request was made by the process to have the resource allocated to it but because the resource is being used by another process and thus busy it cannot be allocated at the moment and the process must wait, we draw an arrow from the process to the resource. Figure 7.1: A simple resource allocation graph, also known as a “wait-for” graph. In the graph in Figure 7.1, we see that process A is holding resource X and waiting for resource Y. Process B is holding resource Y and waiting for resource X. Since each process is waiting for the other to give up its held resource, but this will only happen when each process is able to get the other resource, they will wait forever. This is an example of a deadlock. Deadlocks happen in real life. Figure 7.2 shows an intersection with a set of cars, each of which has progressed to the middle of the intersection. It is 7.1. RESOURCE ALLOCATION GRAPHS 273 important to be able to identify what are the processes, and what are the resources. Figure 7.2: Cars deadlocked in an intersection. The processes are the cars. The resources are the parts of the intersection, specifically the four squares that comprise the intersection. In order for a process to cross the intersection, it needs the two consecutive squares in the direction in which it is going. In the example, car A has square Y but cannot proceed until it gets square Z. Car B has square Z but it cannot proceed until it gets square X. Car C has square X but cannot proceed until it gets square W. And car D has square W but cannot proceed until it gets square Y. We can convert this situation into a resource allocation graph, which is 274 CHAPTER 7. DEADLOCK shown in Figure 7.3. We can now see a pattern. A deadlock exists when we can find a cycle in the resource allocation graph. We see a cycle in Figure 7.3, and in Figure 7.1. Figure 7.3: The resource allocation graph for the cars in the intersection. Figure 7.4 shows an example of memory, where resources are given in- crementally, and how allocations can lead to deadlock. Considering memory of 200 MB, all of which is initially available. Say process P1 requests 80 MB. Since 80 MB is available, P1 is allocated that memory. Say process P2 then requests 70 MB. Again, since 70 MB is available, P2 is allocated that memory. Next P1 requests 60 MB. However, of the 200 MB, only 50 MB are free. Consequently, P1 must wait. P2 is still able to run and requests 80 MB. Again, since only 50 MB are free, P2 must wait. Since the only way 7.2. THE FOUR CONDITIONS FOR DEADLOCK 275 that either process will be unblocked is when memory becomes available, but since the processes themselves are holding that memory, neither will give their memory up because they are blocked, and we have a deadlock. Figure 7.4: A memory that is allocated incrementally. 7.2 The Four Conditions for Deadlock There is a well-developed theory regarding deadlock. We know that for deadlocked to be possible, four conditions must be present. They are shown in Figure 7.5. These conditions are all necessary. Remove any one of them, and dead- 276 CHAPTER 7. DEADLOCK 1. Mutual exclusion: only one process may use a resource at a time 2. Hold-and-wait: a process must be holding a resource while waiting for another 3. No preemption: once a process has a resource, it cannot be forcibly taken away 4. Circular wait: the waiting processes must form a cycle, each process waiting for one of the others Figure 7.5: The four conditions for deadlock. lock will not be possible. This observation leads to a strategy: to prevent deadlock, just make sure one of the conditions is removed. There are three broad strategies for attacking the deadlock problem. The first is called deadlock prevention, whereby deadlock is made impossible by simply removing one of the necessary conditions. The second is deadlock avoidance, where we try to avoid getting into situations that lead to deadlock. And finally, the third is deadlock detection, where, rather than trying to prevent or avoid deadlocks, we allow them to happen, with the idea that we can then detect them, and then try to somehow resolve the situation at that point. In this chapter, we will go through each of the strategies. 7.3 Deadlock Prevention In deadlock prevention, the idea is quite simple. Since all of the four con- ditions for deadlock are necessary, just to remove any one of them, and a 7.3. DEADLOCK PREVENTION 277 deadlock simply cannot happen, i.e., deadlock is prevented. Let’s go through each of the conditions, and see how viable it is to not allow that condition to be present. The first condition is mutual exclusion. We can remove this condition by allowing resources to be shared, rather than their being forced to be used one at a time. However, for some resources, it only makes sense to use them one out of time. For example, a memory that is private to a process cannot be shared with other processes. Or, a critical section of code simply cannot be shared; it must be used by one process at a time. A printer cannot be shared in that, if a document is being printed by one process, another process can’t be using it to print another document at the same time. Consequently, we may not have the ability to simply remove the mutual exclusion condition. How about the second condition, hold-and-wait? This condition says that a process can be holding a resource while it is waiting for another. We can remove this condition by forcing processes to request all the resources they may need simultaneously, and waiting until all of them are free. In this way, a process is not holding some resources while waiting for others. This may sound reasonable, but in general, processes do not know pre- cisely what are all of the resources that they may need. A process will generally learn this as time goes along. Secondly, a process will not be using all of the resources that it may need at all points in time. And so, this can be a wasteful approach. However, if indeed a process knew of all the resources that it needed, and we are not concerned that they may be used inefficiently, 278 CHAPTER 7. DEADLOCK this approach would certainly work. In fact, this approach is used in “mis- sion critical” situations, such as in controlling a nuclear power plant, where resources are pre-allocated such that when they are needed, they are ready to be used. It may be expensive, but it works in that it is guaranteed that there will be no deadlocks. The third condition is no preemption. We can remove this condition by allowing preemption, i.e., simply allowing resources that have been allocated to be taken away. However, this is often inconvenient and undesirable. Imag- ine a printer being used to print a long document, and the printer is then taken away to start printing another document before it finishes printing the first. The user that submitted the first print request will not be very happy with this situation! Finally, we come to the fourth condition: circular wait. The way to get rid of this condition is to prevent cycles. One clever way of doing this is to order all of the resources, which can be done by assigning each of them a unique integer ID. Then, have processes follow a rule such that a new resource can only be accessed if its ID is greater than the IDs of all the resources that it is currently holding. If not, the process has to give up some or all of the resources it is holding so that the rule can be followed. In other words, have all processes order their acquisitions of resources according to increasing ID number. This will actually work, but there are issues. For example, it is often not possible to assign every resource an ID, at least at the beginning of time, 7.3. DEADLOCK PREVENTION 279 because some resources only come into existence on demand, e.g., a block of memory created by a memory allocator. Even if this is not an issue, acquiring resources in a particular order, especially when the process does not know beforehand what all the resources will be, can be inconvenient and inefficient. But, if these are not issues for a particular system (as may be the case for mission critical systems), this approach can certainly be used, and deadlocks will be avoided. Figure 7.6: Cars in an intersection governed by a traffic light. Consider our cars-in-an-intersection problem, as shown in Figure 7.6. If 280 CHAPTER 7. DEADLOCK our theory of deadlock is truly general, then it should apply to this problem. If we remove one of the conditions for deadlock, we should be able to prevent a traffic jam. A common way of solving the traffic problem in real life is to add a traffic light. In doing so, the traffic jam in Figure 7.2 cannot happen. Adding a traffic light must then correspond to removing one of the conditions for deadlock. Which one? It’s not mutual exclusion, because we can’t change the laws of physics and allow two cars to share the same physical space. How about hold-and- wait? Indeed, this is the condition that is removed, as a traffic light will allow a car to acquire the two squares it needs to cross the intersection, rather than getting stuck in the middle of the intersection by holding one square and waiting for the other. For completeness, let’s look at the other two conditions. We did not get rid of no preemption, as once a car enters the critical section and occupies one square we cannot take the square back (unless we forced the car to back up). Finally, the circular-wait condition is irrelevant, because we’ve removed hold-and-wait. This last point brings up a curious relationship between the two condi- tions, hold-and-wait and circular wait. Circular wait depends on hold-and- wait, but not vice versa. If we removed hold-and-wait, we’ve removed circular wait. But if we remove circular wait, we can still have hold-and-wait. Conse- quently, it is good to consider these two conditions separately, as there may be situations where it is convenient to remove circular wait, but difficult or impossible to remove hold-and-wait. But since all we need is to remove one 7.4. DEADLOCK AVOIDANCE 281 of them, we are guaranteed that deadlocks will be prevented if we remove just circular wait. 7.4 Deadlock Avoidance In deadlock avoidance, the idea is to avoid situations that may lead to dead- lock. It is a “softer” approach compared to deadlock prevention. The idea is to apply selective prevention, by removing a condition only when deadlock is possible. This approach works especially well with incremental resource requests. By this, we mean that resources are asked for in increments, and we do not satisfy the request (at that moment in time) if there is the possibility it will lead to a deadlock. This approach requires the constraint that the maximum resource requirements for each process must be known; they would have to be declared by the process, or we might assume some global maximum for all processes. Note that this does not mean the process will ask for all the resources, but only that the process will never request more than the given maximum. The Banker’s algorithm, by Dijkstra, follows this approach. It assumes there are a fixed number of processes and a fixed number of resources, where each process has zero or more of the resources allocated to it. A specific allocation of resources to processes determines the system state, of which there can be two possibilities, either safe or unsafe. 282 CHAPTER 7. DEADLOCK A safe state is one where deadlock is absolutely avoidable. Specifically, there exists a certain order of execution of the processes such that deadlock can be avoided. The goal then is to find this order of execution. An unsafe state is one where deadlock is possible, i.e., that it may not be possible to avoid deadlock. Note that this does not mean that a deadlock will definitely happen, but just that there is no guarantee that it will not happen. Figure 7.7: An illustration of safe, unsafe, and deadlock areas. Figure 7.7 shows a diagram that illustrates the meaning of safe, unsafe, and deadlock states. If the system begins in the area labeled “safe,” which corresponds to the system being in the safe state, it is guaranteed to remain safe as long as we do not venture outside that area. Consequently, at each step, we ask whether the state will change from safe to unsafe. If so, we delay that move. The move would correspond to allocating a resource to satisfy a process’s request. Delaying that move corresponds to making a process wait 7.4. DEADLOCK AVOIDANCE 283 until another process gives up some resources. Notice that if we venture outside the safe area, we are in the unsafe area, i.e., the system state is unsafe. This means that a deadlock is possible, but not necessarily guaranteed. But once in the unsafe area, it is possible to end up in the deadlock area. The key observation is that, if we never venture into the unsafe area, we can never end up in the deadlock area. Consequently, the idea is to remain in the safe area. This is not unlike telling a child that they can play on your side of the street, and not go across the street. If they do, it does not mean that some- thing bad will definitely happen, but only that it may happen. In other words, crossing the street means entering an unsafe area. As long as the child stays in the safe area, nothing bad will happen. (If only life were that simple!) Let’s look at the Banker’s algorithm in detail. There are three data structures that are maintained: 1. a process/resource claim matrix (a 2D array) 2. a process/resource allocation matrix (a 2D array) 3. a resource availability vector (a 1D array) For a given assignment of values to the elements of these three data structures, the system can be characterized as either being safe or unsafe. The characterization is determined as follows. By default, we assume that the system begins in a safe state (if not, we must obtain more resources 284 CHAPTER 7. DEADLOCK before we can proceed). We then ask the question: Is there an ordering of processes, i.e., a schedule in which they will run, such that first process is able to run to completion, return its resources, which can then be used by a second process and in doing so will allow that process to run to completion, return its resources, and so on for all processes? If an ordering can be found such that all the processes are able to complete, we call that a safe state; otherwise it is an unsafe state. Figure 7.8: A process/resource claim matrix, a process/resource allocation matrix, and a resource availability vector, with values that determine this to be a safe state. Let’s look at a few examples. Figure 7.8 shows a claim matrix, an alloca- tion matrix, and an availability vector, populated with values. The particular assignment of values determines whether the state is safe or unsafe. But, to make this determination, we must ask whether there is a ordering or sequence of processes such that if they were made to run in that order, each would be able to complete, without the possibility of having to be blocked because a potential resource request cannot be satisfied. In making this determination, we have to consider all possible orderings of 7.4. DEADLOCK AVOIDANCE 285 processes until we find one. Let’s begin with choosing process P1 to execute first. P1 has declared that it may claim up to 3 units of resource R1 , 2 units of resource R2 , and 2 units of resource R3 (notice that resources come in units, or increments). P1 is currently allocated 1 unit of R1 , and 0 units of R2 or R3. Consequently, during its execution, it may ask for up to 2 units of R1. This is not to say it will definitely do so (it might only ask for one unit, or even none), but only that it may. And if it does, we see that it will block, because the availability vector tells us that there are no available units of R1. Consequently, choosing an ordering where P1 runs first would lead to an unsafe state, and so we discard all such orderings. How about choosing P2 to run first. P2 has declared that it may claim up to 6 units of resource R1 , 1 unit of resource R2 , and 3 units of resource R3. It is currently allocated 6 units of R1 , and 1 unit of R2 , and 2 unit of R3. Consequently, during its execution, at most, it may ask for 1 more unit of R3 , and nothing else. Since 1 unit of R3 is available, P2 could run to completion. Consequently, an ordering that begins with P2 looks promising. But we won’t know until we see that all the other processes are also able to complete. Once P2 completes, it will return all of its resources: 6 units of R1 , 1 unit of R2 , and 2 units of R3 , where the availability vector has the values 6, 2, and 3 for R1 , R2 , and R3 , respectively. Given the resources just made available by P2 , we could select P1 to run next, as it would be able to complete its execution even if it requested the maximum of what it claims it may need. 286 CHAPTER 7. DEADLOCK Consequently, we have an ordering with the first two processes determined, P2 and P1. After P1 completes, it will return all of its resources: 1 unit of R1 , and 0 of R2 and R3 (if it had requested additional ones during its execution, those would be returned too), and so the availability vector will have the values 7, 2, and 3 for R1 , R2 , and R3 , respectively. It is not difficult to see that, if we select P3 to run next, it will be able to complete, and then selecting P4 to run last, all will be able to complete without the possibility of blocking. This entire analysis that we did to find at least one ordering that would allow all processes to complete, without the potential for blocking, is what is needed to be able to characterize the particular configuration of values in Figure 7.8 as a safe state. If we were not able to find any such ordering, then we would say that that specific configuration of values corresponds to an unsafe state. Figure 7.9: A process/resource claim matrix, a process/resource allocation matrix, and a resource availability vector, with values that determine this to be an unsafe state. Consider Figure 7.9. This is an example of an unsafe state. The reason is 7.4. DEADLOCK AVOIDANCE 287 that no process can definitely run to completion. For example, if we choose P1 to run first, it may block asking for 1 unit of R1. The same is true if we were to choose P2 , or P3 , or P4. But, there are no units of R1 available. Note that we don’t know if any of these processes will exercise their right to make these requests; in fact, if they don’t, they may all be able to run to completion. But we don’t know that! Consequently, since deadlock is possible, even though not necessarily certain, we deem this an unsafe state. We can use the Banker’s algorithm to avoid unsafe states. Given that we assume the system begins in a safe state, we are guaranteed that there is some ordering that would allow processes to complete. If this is not the case, we obviously can’t make this assumption, and would have to obtain more resources so this assumption holds. Regardless, we must be able to assume the system starts in the safe state. Consequently, we allow the first process in that ordering to run. We know that whichever resources it requests, we will be able to satisfy all those requests. During the execution of that first process, we can now allow a context switch to occur. In selecting the next process to run, we would ask, might allowing this process to run, at this time, cause the system to go to an unsafe state? If so, we would not allow that process to run next. And so we would consider another process, and ask the same question. At worst, we know that we can always allow the first process to continue running, as allowing that process to run to completion will not result in an unsafe state. A common misconception with this approach is that the system changes 288 CHAPTER 7. DEADLOCK states over time, alternating between safe and unsafe. It does not; in fact, the whole idea is that it must not! The system begins in the safe state, and for each scheduling decision, the scheduler determines whether the particular choice under consideration will lead to an unsafe state. If so, that choice is discarded. Consequently, the system remains in a safe state throughout. This is guaranteed because, by definition, a safe state is one where all the processes are able to complete under a particular ordering. We can just follow that ordering, or consider di↵erent orderings as time goes on, as long as those other orderings do not lead to unsafe states. 7.5 Deadlock Detection and Recovery This approach to dealing with deadlocks is to do nothing special to prevent or avoid them. Rather, we accept that they may happen, and if they do, we will then deal with the situation. Periodically, the system will try to detect that a deadlock occurred. If so, it will then try to do something about it. Or, it may even do nothing! The reasoning behind this approach is that deadlocks rarely happen. Con- sequently, to equip a system with so much machinery that will add to its size and complexity, and yet be used so infrequently, is not worth the trouble. If a deadlock happens, then we’ll just deal with it in a special and perhaps ad hoc way. To detect the deadlock, the system constructs a resource allocation (“wait- 7.5. DEADLOCK DETECTION AND RECOVERY 289 for”) graph, as was discussed at the beginning of this chapter. We saw exam- ples of resource allocation graphs in Figure 7.1 and Figure 7.3 To determine that a deadlock occurred, it will check for cycles in the graph. If a cycle is found, a deadlock exists, and it involves those processes and resources that comprise the nodes in the cycle. This requires identifying all resources, tracking their use, and periodically running the detection algorithm. Once a deadlock is detected, the system can attempt to recover from it. One approach to recovery is to terminate all the deadlocked processes. This will certainly remove the deadlock, but it is drastic and can be costly. A less severe way of doing this is to terminate deadlocked processes one at a time, and to see if the deadlock comes back or not. The order in which processes are terminated may be made dependent on how costly it is to terminate each particular process. The less important processes would be terminated first. We must also consider the resources that are part of the deadlock. If processes are terminated, the resources that were being used might be left in an inconsistent state. For example, a process may have partially written a file (the resource in this case), leaving it in a state that is unusable by other processes. Another example would be an interrupted message or file transfer. Somehow, there would have to be a way of restarting the transfer, or doing away with it completely (and ignoring or undoing what was already trans- ferred). Suffice it to say that this is a difficult problem, and resolving each situation in an ad hoc way adds complexity to the system such that it may outweigh what was gained by not doing deadlock prevention or avoidance. 290 CHAPTER 7. DEADLOCK The most extreme solution is to do absolutely nothing. In this case, the system ultimately relies on the user to determine that something is wrong, and act accordingly. In fact, in this “approach,” the user is given both the responsibility for detecting deadlocks and for recovering from them. For example, observing that a process (or a group of processes) is “frozen,” (e.g., a mouse click in a window has no e↵ect), the user would surmise that a deadlock has occurred. This is the “detection” part of the approach. At that point, the user would take action. The most common would be to manually kill the process, or in the worst case, reboot the entire system. This is the “recovery” part of the approach! As extreme as this may sound, this is the approach that is most likely expected for the operating system running on your laptop! 7.6 Summary A deadlock occurs when a set of processes are mutually waiting for each other; the progress of any one of them depends on that of another, but since all are blocked, no progress can or will ever be made. For a deadlock to be possible, four conditions must be present: mutual exclusion, hold-and- wait, no preemption, and circular wait. Each is necessary: remove any one condition, and deadlock becomes impossible. There are three broad strategies for dealing with deadlocks: deadlock prevention, deadlock avoidance, and deadlock detection and recovery. In 7.7. EXERCISES 291 deadlock prevention, one of the necessary conditions is explicitly removed. In deadlock avoidance, a condition is removed, but only at certain times and temporarily. In deadlock detection and recovery, deadlocks are simply allowed to occur, with the expectation that they will then be detected and the system will attempt to recover. In its extreme form, the responsibility for detection and recovery is placed on the user. 7.7 Exercises 1. What is meant by “deadlock”? 2. When discussing deadlocks, what is meant by a “resource”?* 3. In Figure 7.2 what are the processes and what are the resources?* 4. In Figure 7.4, what are the processes and what are the resources?* 5. What are the four conditions of deadlock? 6. What is the “mutual exclusion” condition? 7. What is the “hold-and-wait” condition? 8. What is the “no preemption” condition? 9. What is the “circular wait” condition? 10. Are the conditions independent from each other; why or why not?** 292 CHAPTER 7. DEADLOCK 11. What is meant by deadlock prevention? 12. What is meant by deadlock avoidance? 13. What is meant by deadlock detection and recovery? 14. How is deadlock prevention di↵erent from deadlock avoidance, and can you give an example?** 15. Which is the easiest to implement; which is the hardest?** 16. In deadlock prevention, for each of the four conditions for deadlock, how can they be prevented, and can you give an example of each in a real situation?** 17. In Figure 7.6, which condition is being prevented, and why?* 18. What kind of operating system would use this technique?*** 19. In deadlock avoidance, why is it considered “selective prevention”?* 20. What is meant by incremental resource requests? 21. What is meant by “need maximum resource requirements”; can you give an example?* 22. What kind of operating system would use this technique?*** 23. In the Banker’s Algorithm, what is meant by a “safe state”?* What about “unsafe state”?* 7.7. EXERCISES 293 24. In Figure 7.7, what is being conveyed by the illustration?* 25. What is a process/resource claim matrix?* 26. What is a process/resource allocation matrix?* 27. What is a resource availability vector?* 28. What is the process ordering test used by the Banker’s algorithm?* 29. In Figure 7.8, why is the state safe?* 30. In Figure 7.9, why is the state unsafe?* Figure 7.10: An intersection governed by the rule: at most 3 cars allowed. 294 CHAPTER 7. DEADLOCK 31. In Figure 7.10, the intersection is governed by the rule: at most 3 cars allowed. Which condition is being avoided?** 32. What is the key justification for allowing deadlocks to happen in Dead- lock Detection and Recovery? 33. Why do most general-purpose operating systems use this technique?** 34. What would be an example of an operating system that cannot use this technique?*** 35. For deadlock detection, what is the purpose of the wait-for graph?* 36. How can an operating system recover from deadlock?* 37. Why is terminating all deadlocked processes costly?** 38. What is the reasoning behind terminating processes one at a time, and what are some of the complications?** 39. In your solution for the Dining Philosopher’s Problem in Exercise 55 of Chapter 5, modify it so the deadlocks are avoided.*** 40. In your solution for the Readers/Writers Problem in Exercise 62 of Chapter 5, modify it so the deadlocks are avoided.***

Document Details

Tags

Related

Summary

Full Transcript