Operating System Concepts 8th Edition PDF
Document Details
Uploaded by IllustriousSet
Kristu Jayanti College
Silberschatz, Galvin, Gagne
Tags
Related
- Operating Systems: Internals and Design Principles Seventh Edition PDF
- Modern Operating Systems (5th Edition) PDF
- Computer Science: Chapter 3 - Operating Systems (PDF)
- Computer Science: Operating Systems PDF
- System Fundamentals and Advanced Systems (C310181) Past Paper PDF
- Computer Science An Overview PDF
Summary
This textbook provides a comprehensive introduction to operating system concepts. It covers fundamental concepts, algorithms, and examples within various operating systems, including Solaris, Linux, Windows XP, and Mac OS X. The book is suitable for undergraduate and graduate courses, offering detailed descriptions of core concepts.
Full Transcript
To my children, Lemar, Sivan, and Aaron and my Nicolette Avi Silberschatz To my wife, Carla, and my children, Gwen, Owen, and Maddie Peter Baer Galvin To my wife, Pat, and our sons, Tom and Jay Greg Gagne Abraham Silberschatz is the Sidney J. Weinberg P...
To my children, Lemar, Sivan, and Aaron and my Nicolette Avi Silberschatz To my wife, Carla, and my children, Gwen, Owen, and Maddie Peter Baer Galvin To my wife, Pat, and our sons, Tom and Jay Greg Gagne Abraham Silberschatz is the Sidney J. Weinberg Professor & Chair of Com- puter Science at Yale University. Prior to joining Yale, he was the Vice President of the Information Sciences Research Center at Bell Laboratories. Prior to that, he held a chaired professorship in the Department of Computer Sciences at the University of Texas at Austin. Professor Silberschatz is an ACM Fellow and an IEEE Fellow. He received the 2002 IEEE Taylor L. Booth Education Award, the 1998 ACM Karl V. Karl- strom Outstanding Educator Award, and the 1997 ACM SIGMOD Contribution Award. In recognition of his outstanding level of innovation and technical excellence, he was awarded the Bell Laboratories President's Award for three different projects-the QTM Project (1998), the DataBlitz Project (1999), and the Netlnventory Project (2004). Professor Silberschatz' writings have appeared in numerous ACM and IEEE publications and other professional conferences and journals. He is a coauthor of the textbook Database System Concepts. He has also written Op-Ed articles for the New York Times, the Boston Globe, and the Hartford Courant, among others. Peter Baer Galvin is the chief technologist for Corporate Technologies (www.cptech.com), a computer facility reseller and integrator. Before that, Mr. Galvin was the systems manager for Brown University's Computer Science Department. He is also Sun columnist for ;login: magazine. Mr. Galvin has written articles for Byte and other magazines, and has written columns for Sun World and SysAdmin magazines. As a consultant and trainer, he has given talks and taught tutorials on security and system administration worldwide. Greg Gagne is chair of the Computer Science department at Westminster College in Salt Lake City where he has been teaching since 1990. In addition to teaching operating systems, he also teaches computer networks, distributed systems, and software engineering. He also provides workshops to computer science educators and industry professionals. Operating systems are an essential part of any computer system. Similarly, a course on operating systems is an essential part of any computer-science education. This field is undergoing rapid change, as computers are now prevalent in virtually every application, from games for children through the most sophisticated planning tools for governments and multinational firms. Yet the fundamental concepts remain fairly clear, and it is on these that we base this book. We wrote this book as a text for an introductory course in operating systems at the junior or senior undergraduate level or at the first-year graduate level. We hope that practitioners will also find it useful. It provides a clear description of the concepts that underlie operating systems. As prerequisites, we assume that the reader is familiar with basic data struchues, computer organization, and a high-level language, such as C or Java. The hardware topics required for an understanding of operating systems are included in Chapter 1. For code examples, we use predominantly C, with some Java, but the reader can still understand the algorithms without a thorough knowledge of these languages. Concepts are presented using intuitive descriptions. Important theoretical results are covered, but formal proofs are omitted. The bibliographical notes at the end of each chapter contain pointers to research papers in which results were first presented and proved, as well as references to material for further reading. In place of proofs, figures and examples are used to suggest why we should expect the result in question to be true. The fundamental concepts and algorithms covered in the book are often based on those used in existing conunercial operating systems. Our aim is to present these concepts and algorithms in a general setting that is not tied to one particular operating system. We present a large number of examples that pertain to the most popular and the most im1.ovative operating systems, including Sun Microsystems' Solaris; Linux; Microsoft Windows Vista, Windows 2000, and Windows XP; and Apple Mac OS X. When we refer to Windows XP as an example operating system, we are implying Windows Vista, Windows XP, and Windows 2000. If a feature exists in a specific release, we state this explicitly. vii viii The organization of this text reflects our many years of teaching courses on operating systems. Consideration was also given to the feedback provided by the reviewers of the text, as well as comments submitted by readers of earlier editions. In addition, the content of the text corresponds to the suggestions from Computing Curricula 2005 for teaching operating systems, published by the Joint Task Force of the IEEE Computing Society and the Association for Computing Machinery (ACM). On the supporting Web site for this text, we provide several sample syllabi that suggest various approaches for using the text in both introductory and advanced courses. As a general rule, we encourage readers to progress sequentially through the chapters, as this strategy provides the most thorough study of operating systems. However, by using the sample syllabi, a reader can select a different ordering of chapters (or subsections of chapters). On-line support for the text is provided by WileyPLUS. On this site, students can find sample exercises and programming problems, and instructors can assign and grade problems. In addition, in WileyPLUS, students can access new operating-system simulators, which are used to work through exercises and hands-on lab activities. References to the simulators and associated activities appear at the ends of several chapters in the text. The text is organized in nine major parts: Overview. Chapters 1 and 2 explain what operating systems are, what they do, and how they are designed and constructed. These chapters discuss what the common features of an operating system are, what an operating system does for the user, and what it does for the computer-system operator. The presentation is motivational and explanatory in nature. We have avoided a discussion of how things are done internally in these chapters. Therefore, they are suitable for individual readers or for students in lower-level classes who want to learn what an operating system is without getting into the details of the internal algorithms. Process management and Process coordination. Chapters 3 through 7 describe the process concept and concurrency as the heart of modern operating systems. A process is the unit of work in a system.. Such a system consists of a collection of concurrently executing processes, some of which are operating-system processes (those that execute system code) and the rest of which are user processes (those that execute user code). These chapters cover n1.ethods for process scheduling, interprocess communication, process synchronization, and deadlock handling. Also included is a discussion of threads, as well as an examination of issues related to multicore systems. Memory management. Chapters 8 and 9 deal with the management of main memory during the execution of a process. To improve both the utilization of the CPU and the speed of its response to its users, the computer must keep several processes in memory. There are many different ix management, and the effectiveness of a particular algorithm depends on the situation. Storage management. Chapters 10 through 13 describe how the file system, mass storage, and I/0 are handled in a modern computer system. The file system provides the mechanism for on-line storage of and access to both data and programs. We describe the classic internal algorithms and structures of storage management and provide a firm practical understanding of the algorithms used -their properties, advantages, and disadvantages. Our discussion of storage also includes matters related to secondary and tertiary storage. Since the I/0 devices that attach to a computer vary widely, the operating system needs to provide a wide range of functionality to applications to allow them to control all aspects of these devices. We discuss system I/O in depth, including I/O system design, interfaces, and internal system structures and functions. In many ways, I/O devices are the slowest major components of the computer. Because they represent a performance bottleneck, we also examine performance issues associated with I/0 devices. Protection and security. Chapters 14 and 15 discuss the mechanisms necessary for the protection and security of computer systems. The processes in an operating system must be protected from one another's activities, and to provide such protection, we must ensure that only processes that have gained proper authorization from the operating system can operate on the files, memory, CPU, and other resources of the system. Protection is a mechanism for controlling the access of programs, processes, or users to the resources defined by a computer system. This mechanism must provide a means of specifying the controls to be imposed, as well as a means of enforcement. Security protects the integrity of the information stored in the system (both data and code), as well as the physical resources of the system, from 1.mauthorized access, malicious destruction or alteration, and accidental introduction of inconsistency. Distributed systems. Chapters 16 through 18 deal with a collection of processors that do not share memory or a clock-a distributed system. By providing the user with access to the various resources that it maintains, a distributed system can improve computation speed and data availability and reliability. Such a system also provides the user with a distributed file system, which is a file-service system whose users, servers, and storage devices are dispersed among the sites of a distributed system. A distributed system must provide various mechanisms for process synchronization and communication, as well as for dealing with deadlock problems and a variety of failures that are not encountered in a centralized system. Special-purpose systems. Chapters 19 and 20 deal with systems used for specific purposes, including real-time systems and multimedia systems. These systems have specific requirements that differ from those of the general-purpose systems that are the focus of the remainder of the text. Real-time systems may require not only that computed results be "correct" but also that the results be produced within a specified deadline period. Multimedia systems require quality-of-service guarantees ensuring that the multimedia data are delivered to clients within a specific time frame. X Case studies. Chapters 21 through 23 in the book, and Appendices A through C (which are available on www.wiley.comJ go I global/ silberschatz and in WileyPLUS), integrate the concepts described in the earlier chapters by describing real operating systems. These systems include Linux, Windows XP, FreeBSD, Mach, and Windows 2000. We chose Linux and FreeBSD because UNIX-at one time-was almost small enough to understand yet was not a "toy" operating system. Most of its internal algorithms were selected for simplicity, rather than for speed or sophistication. Both Linux and FreeBSD are readily available to computer-science departments, so many students have access to these systems. We chose Windows XP and Windows 2000 because they provide an opporhmity for us to study a modern operating system with a design and implementation drastically different from those of UNIX. Chapter 23 briefly describes a few other influential operating systems. This book uses examples of many real-world operating systems to illustrate fundamental operating-system concepts. However, particular attention is paid to the Microsoft family of operating systems (including Windows Vista, Windows 2000, and Windows XP) and various versions of UNIX (including Solaris, BSD, and Mac OS X). We also provide a significant amount of coverage of the Linux operating system reflecting the most recent version of the kernel -Version 2.6-at the time this book was written. The text also provides several example programs written in C and Java. These programs are intended to run in. the following programming environments: Windows systems. The primary programming environment for Windows systems is the Win32 API (application programming interface), which pro- vides a comprehensive set of functions for managing processes, threads, memory, and peripheral devices. We provide several C programs illustrat- ing the use of the Win32 API. Example programs were tested on systems rum1.ing Windows Vista, Windows 2000, and Windows XP. POSIX. POSIX (which stands for Portable Operating System Inte1jace) repre- sents a set of standards implemented primarily for UNIX-based operating systems. Although Windows Vista, Windows XP, and Windows 2000 sys- tems can also run certain POSIX programs, our coverage of POSIX focuses primarily on UNIX and Linux systems. POSIX-compliant systems must implement the POSIX core standard (POSIX.1): Linux, Solaris, and Mac OS X are examples of POSIX-compliant systems. POSIX also defines several extensions to the standards, including real-time extensions (POSIXl.b) and an extension for a threads library (POSIX1.c, better known as Pthreads). We provide several programn1.ing examples written inC illustrating the POSIX base API, as well as Pthreads and the extensions for real-time programming. These example programs were tested on Debian Linux 2.4 and 2.6 systems, Mac OS X 10.5, and Solaris 10 using the gee 3.3 and 4.0 compilers. Java. Java is a widely used programming language with a rich API and built-in language support for thread creation and management. Java xi programs run on any operating system supporting a Java virtual machine (or JVM). We illustrate various operating system and networking concepts with several Java programs tested using the Java 1.5 JVM. We have chosen these three programming environments because it is our opinion that they best represent the two most popular models of operating systems: Windows and UNIX/Linux, along with the widely used Java environ- ment. Most programming examples are written in C, and we expect readers to be comfortable with this language; readers familiar with both the C and Java languages should easily understand most programs provided in this text. In some instances-such as thread creation-we illustrate a specific concept using all three programming environments, allowing the reader to contrast the three different libraries as they address the same task. In other situations, we may use just one of the APis to demonstrate a concept. For example, we illustrate shared memory using just the POSIX API; socket programming in TCP /IP is highlighted using the Java API. As we wrote the Eighth Edition of Operating System Concepts, we were guided by the many comments and suggestions we received from readers of our previous editions, as well as by our own observations about the rapidly changing fields of operating systems and networking. We have rewritten material in most of the chapters by bringing older material up to date and removing material that was no longer of interest or relevance. We have made substantive revisions and organizational changes in many of the chapters. Most importantly, we have added coverage of open-source operating systems in Chapter 1. We have also added more practice exercises for students and included solutions in WileyPLUS, which also includes new simulators to provide demonstrations of operating-system operation. Below, we provide a brief outline of the major changes to the various chapters: Chapter 1, Introduction, has been expanded to include multicore CPUs, clustered computers, and open-source operating systems. Chapter 2, System Structures, provides significantly updated coverage of virtual machines, as well as multicore CPUs, the GRUB boot loader, and operating-system debugging. Chapter 3, Process Concept, provides new coverage of pipes as a form of interprocess communication. Chapter 4, Multithreaded Programming, adds new coverage of program- ming for multicore systems. Chapter 5, Process Scheduling, adds coverage of virtual machine schedul- ing and multithreaded, multicore architectures. Chapter 6, Synchronization, adds a discussion of mutual exclusion locks, priority inversion, and transactional memory. Chapter 8, Memory-Management Strategies, includes discussion of NUMA. xii Chapter 9, Virtual-Memory Management, updates the Solaris example to include Solaris 10 memory managernent. Chapter 10, File System, is updated with current technologies and capacities. Chapter 11, Implementing File Systems, includes a full description of Sun's ZFS file system and expands the coverage of volumes and directories. Chapter 12, Secondary-Storage Structure, adds coverage of iSCSI, vol- umes, and ZFS pools. Chapter 13, I/0 Systems, adds coverage of PCIX PCI Express, and Hyper- Transport. Chapter 16, Distributed Operating Systems, adds coverage of 802.11 wireless networks. Chapter 21, The LimiX System, has been updated to cover the latest version of the LimiX kernel. Chapter 23, Influential Operating Systems, increases coverage of very early computers as well as TOPS-20, CP/M, MS-DOS, Windows, and the original Mac OS. To emphasize the concepts presented in the text, we have added several programming problems and projects that use the POSIX and Win32 APis, as well as Java. We have added more than 15 new programming problems, which emphasize processes, threads, shared memory, process synchronization, and networking. In addition, we have added or modified several programming projects that are more involved than standard programming exercises. These projects include adding a system call to the Linux kernel, using pipes on both UNIX and Windows systems, using UNIX message queues, creating multithreaded applications, and solving the producer-consumer problem using shared memory. The Eighth Edition also incorporates a set of operating-system simulators designed by Steven Robbins of the University of Texas at San Antonio. The simulators are intended to model the behavior of an operating system as it performs various tasks, such as CPU and disk-head schedulil1.g, process creation and interprocess communication, starvation, and address translation. These simulators are written in Java and will run on any computer systern with Java 1.4. Students can download the simulators from WileyPLUS and observe the behavior of several operating system concepts in various scenarios. In addition, each simulator includes several exercises that ask students to set certain parameters of the simulator, observe how the system behaves, and then explain this behavior. These exercises can be assigned through WileyPLUS. The WileyPLUS course also includes algorithmic problems and tutorials developed by Scott M. Pike of Texas A&M University. xiii The following teaching supplencents are available in WileyPLUS and on www.wiley.coml go I global/ silberschatz: a set of slides to accompany the book, model course syllabi, all C and Java source code, up-to-date errata, three case study appendices and the Distributed Communication appendix. The WileyPLUS course also contains the simulators and associated exercises, additional practice exercises (with solutions) not found in the text, and a testbank of additional problems. Students are encouraged to solve the practice exercises on their own and then use the provided solutions to check their own answers. To obtain restricted supplements, such as the solution guide to the exercises in the text, contact your local Jorne Wiley & Sons sales representative. Note that these supplements are available only to faculty who use this text. We use the mailman system for communication among the users of Operating System Concepts. If you wish to use this facility, please visit the following URL and follow the instructions there to subscribe: http: I I mailman.cs.yale.edul mailmanllistinfo I os-book The mailman mailing-list system provides many benefits, such as an archive of postings, as well as several subscription options, including digest and Web only. To send messages to the list, send e-mail to: [email protected] Depending on the message, we will either reply to you personally or forward the message to everyone on the mailing list. The list is moderated, so you will receive no inappropriate mail. Students who are using this book as a text for class should not use the list to ask for answers to the exercises. They will not be provided. We have attempted to clean up every error in this new edition, but-as happens with operating systems-a few obscure bugs may remain. We would appreciate hearing from you about any textual errors or omissions that you identify. If you would like to suggest improvements or to contribute exercises, we would also be glad to hear from you. Please send correspondence to [email protected]. This book is derived from the previous editions, the first three of which were coauthored by James Peterson. Others who helped us with previous editions include Hamid Arabnia, Rida Bazzi, Randy Bentson, David Black, xiv Joseph Boykin, Jeff Brumfield, Gael Buckley, Roy Campbell, P. C. Capon, John Carpenter, Gil Carrick, Thomas Casavant, Bart Childs, Ajoy Kum.ar Datta, Joe Deck, Sudarshan K. Dhall, Thomas Doeppner, Caleb Drake, M. Racsit Eskicioglu, Hans Flack, Robert Fowler, G. Scott Graham, Richard Guy, Max Hailperin, Rebecca I-Iartncan, Wayne Hathaway, Christopher Haynes, Don Heller, Bruce Hillyer, Mark Holliday, Dean Hougen, Michael Huangs, Ahmed Kamet Marty Kewstet Richard Kieburtz, Carol Kroll, Marty Kwestet Thomas LeBlanc, John Leggett, Jerrold Leichter, Ted Leung, Gary Lippman, Carolyn Miller, Michael Molloy, Euripides Montagne, Yoichi Muraoka, Jim M. Ng, Banu Ozden, Ed Posnak, Boris Putanec, Charles Qualline, John Quarterman, Mike Reiter, Gustavo Rodriguez-Rivera, Carolyn J. C. Schauble, Thomas P. Skimcer, Yannis Smaragdakis, Jesse St. Laurent, John Stankovic, Adam Stauffer, Steven Stepanek, John Sterling, Hal Stern, Louis Stevens, Pete Thomas, David Umbaugh, Steve Vinoski, Tommy Wagner, Larry L. Wear, Jolm Werth, James M. Westall, J. S. Weston, and Yang Xiang Parts of Chapter 12 were derived from a paper by Hillyer and Silberschatz. Parts of Chapter 17 were derived from a paper by Levy and Silberschatz. Chapter 21 was derived from an unpublished manuscript by Stephen Tweedie. Chapter 22 was derived from an unpublished manuscript by Dave Probert, Cliff Martin, and Avi Silberschatz. Appendix C was derived from an unpublished manuscript by Cliff Martin. Cliff Martin also helped with updating the UNIX appendix to cover FreeBSD. Some of the exercises and accompanying solutions were supplied by Arvind Krishnamurthy. Mike Shapiro, Bryan Cantrill, and Jim Mauro answered several Solaris- related questions. Bryan Cantrill from Sun Microsystems helped with the ZFS coverage. Steve Robbins of the University of Texas at San Antonio designed the set of simulators that we incorporate in WileyPLUS. Reece Newman of Westminster College initially explored this set of simulators and their appropriateness for this text. Josh Dees and Rob Reynolds contributed coverage of Microsoft's.NET. The project for POSIX message queues was contributed by John Trona of Saint Michael's College in Colchester, Vermont. Marilyn Turnamian helped generate figures and presentation slides. Mark Wogahn has made sure that the software to produce the book (e.g., Latex macros, fonts) works properly. Our Associate Publisher, Dan Sayre, provided expert guidance as we prepared this edition. He was assisted by Carolyn Weisman, who managed many details of this project smoothly. The Senior Production Editor Ken Santor, was instrumental in handling all the production details. Lauren Sapira and Cindy Jolmson have been very helpful with getting material ready and available for WileyPlus. Beverly Peavler copy-edited the manuscript. The freelance proofreader was Katrina Avery; the freelance indexer was Word Co, Inc. Abraham Silberschatz, New Haven, CT, 2008 Peter Baer Galvin, Burlington, MA 2008 Greg Gagne, Salt Lake City, UT, 2008 PART ONE OVERVIEW Chapter 1 Introduction 1.1 What Operating Systems Do 3 1.9 Protection and Security 29 1.2 Computer-System Organization 6 1.10 Distributed Systems 30 1.3 Computer-System Architecture 12 1.11 Special-Purpose Systems 32 1.4 Operating-System Sh·ucture 18 1.12 Computing Environments 34 1.5 Operating-System Operations 20 1.13 Open-Source Operating Systems 37 1.6 Process Management 23 1.14 Summary 40 1.7 Memory Management 24 Exercises 42 1.8 Storage Management 25 Bibliographical Notes 46 Chapter 2 System Structures 2.1 Operating-System Services 49 2.8 Virtual Machines 76 2.2 User Operating-System Interface 52 2.9 Operating-System Debugging 84 2.3 System Calls 55 2.10 Operating-System Generation 88 2.4 Types of System Calls 58 2.11 System Boot 89 2.5 System Programs 66 2.12 Summary 90 2.6 Operating-System Design and Exercises 91 Implementation 68 Bibliographical Notes 97 2.7 Operating-System Structure 70 PART TWO PROCESS MANAGEMENT Chapter 3 Process Concept 3.1 Process Concept 101 3.6 Communication in Client- 3.2 Process Scheduling 105 Server Systems 128 3.3 Operations on Processes 110 3.7 Summary 140 3.4 Interprocess Communication 116 Exercises 141 3.5 Examples of IPC Systems 123 Bibliographical Notes 152 XV xvi Chapter 4 Multithreaded Programming 4.1 Overview 153 4.5 Operating-System Examples 171 4.2 Multithreading Models 157 4.6 Summary 174 4.3 Thread Libraries 159 Exercises 174 4.4 Threading Issues 165 Bibliographical Notes 181 Chapter 5 Process Scheduling 5.1 Basic Concepts 183 5.6 Operating System Examples 206 5.2 Scheduling Criteria 187 5.7 Algorithm Evaluation 213 5.3 Scheduling Algorithms 188 5.8 Summary 217 5.4 Thread Scheduling 199 Exercises 218 5.5 Multiple-Processor Scheduling 200 Bibliographical Notes 222 PART THREE PROCESS COORDINATION Chapter 6 Synchronization 6.1 Backgrmmd 225 6.7 Monitors 244 6.2 The Critical-Section Problem 227 6.8 Synchronization Examples 252 6.3 Peterson's Solution 229 6.9 Atomic Transactions 257 6.4 Synchronization Hardware 231 6.10 Summary 267 6.5 Semaphores 234 Exercises 267 6.6 Classic Problems of Bibliographical Notes 280 Synchronization 239 Chapter 7 Deadlocks 7.1 System Model 283 7.6 Deadlock Detection 301 7.2 Deadlock Characterization 285 7.7 Recovery from Deadlock 304 7.3 Methods for Handling Deadlocks 290 7.8 Summary 306 7.4 Deadlock Prevention 291 Exercises 307 7.5 Deadlock Avoidance 294 Bibliographical Notes 310 PART FOUR MEMORY MANAGEMENT Chapter 8 Memory-Management Strategies 8.1 Background 315 8.6 Segmentation 342 8.2 Swapping 322 8.7 Example: The Intel Pentium 345 8.3 Contiguous Memory Allocation 324 8.8 Summary 349 8.4 Paging 328 Exercises 350 8.5 Structure of the Page Table 337 Bibliographical Notes 354 xvii Chapter 9 Virtual-Memory Management 9.1 Background 357 9.8 Allocating Kernel Memory 396 9.2 Demand Paging 361 9.9 Other Considerations 399 9.3 Copy-on-Write 367 9.10 Operating-System Examples 405 9.4 Page Replacement 369 9.11 Summary 407 9.5 Allocation of Frames 382 Exercises 409 9.6 Thrashing 386 Bibliographical Notes 416 9.7 Memory-Mapped Files 390 PART FIVE STORAGE MANAGEMENT Chapter 10 File System 10.1 File Concept 421 10.6 Protection 451 10.2 Access Methods 430 10.7 Summary 456 10.3 Directory and Disk Structure 433 Exercises 457 10.4 File-System Mounting 444 Bibliographical Notes 458 10.5 File Sharing 446 Chapter 11 Implementing File Systems 11.1 File-System Structure 461 11.7 Recovery 486 11.2 File-System Implementation 464 11.8 NFS 490 11.3 Directory Implementation 470 11.9 Example: The WAFL File System 496 11.4 Allocation Methods 471 11.10 Summary 498 11.5 Free-Space Management 479 Exercises 499 11.6 Efficiency and Performance 482 Bibliographical Notes 502 Chapter 12 Secondary-Storage Structure 12.1 Overview of Mass-Storage 12.7 RAID Structure 522 Structure 505 12.8 Stable-Storage Implementation 533 12.2 Disk Structure 508 12.9 Tertiary-Storage Struchue 534 12.3 Disk Attachment 509 12.10 Summary 543 12.4 Disk Scheduling 510 Exercises 545 12.5 Disk Man.agement 516 Bibliographical Notes 552 12.6 Swap-Space Management 520 Chapter 13 I/0 Systems 13.1 Overview 555 13.6 STREAMS 580 13.2 I/0 Hardware 556 13.7 Performance 582 13.3 Application I/0 Interface 565 13.8 Summary 585 13.4 Kernel I/0 Subsystem 571 Exercises 586 13.5 Transforming I/0 Requests to Bibliographical Notes 588 Hardware Operations 578 xviii PART SIX PROTECTION AND SECURITY Chapter 14 System Protection 14.1 Goals of Protection 591 14.7 Revocation of Access Rights 606 14.2 Principles of Protection 592 14.8 Capability-Based Systems 607 14.3 Domain of Protection 593 14.9 Language-Based Protection 610 14.4 Access Matrix 598 14.10 Surnmary 615 14.5 Implementation of Access Matrix 602 Exercises 616 14.6 Access Control 605 Bibliographical Notes 618 Chapter 15 System Security 15.1 The Security Problem 621 15.8 Computer-Security 15.2 Program Threats 625 Classifications 662 15.3 System and Network Threats 633 15.9 An Example: Windows XP 664 15.4 Cryptography as a Security Tool 638 15.10 Summary 665 15.5 User Authentication 649 Exercises 666 15.6 Implementing Security Defenses 654 Bibliographical Notes 667 15.7 Firewalling to Protect Systems and Networks 661 PART SEVEN DISTRIBUTED SYSTEMS Chapter 16 Distributed Operating Systems 16.1 Motivation 673 16.7 Robustness 694 16.2 Types of Network- 16.8 Design Issues 697 based Operating Systems 675 16.9 An Example: Networking 699 16.3 Network Structure 679 16.10 Summary 701 16.4 Network Topology 683 Exercises 701 16.5 Communication Structure 684 Bibliographical Notes 703 16.6 Communication Protocols 690 Chapter 17 Distributed File Systems 17.1 Background 705 17.6 An Example: AFS 718 17.2 Naming and Transparency 707 17.7 Summary 723 17.3 Remote File Access 710 Exercises 724 17.4 Stateful versus Stateless Service 715 Bibliographical Notes 725 17.5 File Replication 716 Chapter 18 Distributed Synchronization 18.1 Event Ordering 727 18.6 Election Algorithms 747 18.2 Mutual Exclusion 730 18.7 Reaching Agreement 750 18.3 Atomicity 733 18.8 Summary 752 18.4 Concurrency Control 736 Exercises 753 18.5 Deadlock Handling 740 Bibliographical Notes 754 xix PART EIGHT SPECIAL PURPOSE SYSTEMS Chapter 19 Real-Time Systems 19.1 Overview 759 19.5 Real-Time CPU Scheduling 768 19.2 System Characteristics 760 19.6 An Example: VxWorks 5.x 774 19.3 Features of Real-Time Kernels 762 19.7 Summary 776 19.4 Implementing Real-Time Operating Exercises 777 Systems 764 Bibliographical Notes 777 Chapter 20 Multimedia Systems 20.1 What Is Multimedia? 779 20.6 Network Management 789 20.2 Compression 782 20.7 An Example: CineBlitz 792 20.3 Requirements of Multimedia 20.8 Summary 795 Kernels 784 Exercises 795 20.4 CPU Scheduling 786 Bibliographical Notes 797 20.5 Disk Scheduling 787 PART NINE CASE STUDIES Chapter 21 The Linux System 21.1 Linux History 801 21.8 Input and Output 834 21.2 Design Principles 806 21.9 Interprocess Communication 837 21.3 Kernel Modules 809 21.10 Network Structure 838 21.4 Process Management 812 21.11 Security 840 21.5 Scheduling 815 21.12 Summary 843 21.6 Memory Management 820 Exercises 844 21.7 File Systems 828 Bibliographical Notes 845 Chapter 22 Windows XP 22.1 History 847 22.6 Networking 886 22.2 Design Principles 849 22.7 Programmer Interface 892 22.3 System Components 851 22.8 Sum.mary 900 22.4 Environmental Subsystems 874 Exercises 900 22.5 File System 878 Bibliographical Notes 901 Chapter 23 Influential Operating Systems 23.1 Feature Migration 903 23.9 IBM OS/360 915 23.2 Early Systems 904 23.10 TOPS-20 917 23.3 Atlas 911 23.11 CP/M and MS/DOS 917 23.4 XDS-940 912 23.12 Macintosh Operating System and 23.5 THE 913 Windows 918 23.6 RC 4000 913 23.13 Mach 919 23.7 CTSS 914 23.14 Other Systems 920 23.8 MULTICS 915 Exercises 921 XX Chapter A BSD UNIX A1 UNIX History 1 A7 File System 25 A2 Design Principles 6 AS I/0 System 32 A3 Programmer Interface 8 A9 Interprocess Communication 35 A.4 User Interface 15 AlO Summary 40 AS Process Management 18 Exercises 41 A6 Memory Management 22 Bibliographical Notes 42 Appendix B The Mach System B.l History of the Mach System 1 B.7 Programmer Interface 23 B.2 Design Principles 3 B.S Summary 24 B.3 System Components 4 Exercises 25 B.4 Process Management 7 Bibliographical Notes 26 B.S Interprocess Conununication 13 Credits 27 B.6 Memory Management 18 Appendix C Windows 2000 C.1 History 1 C.6 Networking 28 C.2 Design Principles 2 C.7 Programmer Interface 33 C.3 System Components 3 C.S Summary 40 C.4 Enviromnental Subsystems 19 Exercises 40 C.S File System 22 Bibliographical Notes 41 Bibliography 923 Credits 941 Index 943 Part One An operating system acts as an intermediary between the user of a computer and the computer hardware. The purpose of an operating system is to provide an environment in which a user can execute programs in a convenient and efficient manner. An operating system is software that manages the computer hard- ware. The hardware must provide appropriate mechanisms to ensure the correct operation of the computer system and to prevent user programs from interfering with the proper operation of the system. Internally, operating systems vary greatly in their makeup, since they are organized along many different lines. The design of a new operating system is a major task. It is impmtant that the goals of the system be well defined before the design begins. These goals form the basis for choices among various algorithms and strategies. Because an operating system is large and complex, it must be created piece by piece. Each of these pieces should be a well delineated portion of the system, with carefully defined inputs, outputs, and functions. CH ER An is a program that manages the computer hardware. It also provides a basis for application programs and acts as an intermediary between the computer user and the computer hardware. An amazing aspect of operating systems is how varied they are in accomplishing these tasks. Mainframe operating systems are designed primarily to optimize utilization of hardware. Personal computer (PC) operating systems support complex games, business applications, and everything in between. Operating systems for handheld computers are designed to provide an environment in which a user can easily interface with the computer to execute programs. Thus, some operating systems are designed to be convenient, others to be efficient, and others some combination of the two. Before we can explore the details of computer system operation, we need to know something about system structure. We begin by discussing the basic functions of system startup, I/0, and storage. We also describe the basic computer architecture that makes it possible to write a functional operating system. Because an operating system is large and complex, it must be created piece by piece. Each of these pieces should be a well-delineated portion of the system, with carefully defined inputs, outputs, and functions. In this chapter, we provide a general overview of the major components of an operating system. To provide a grand tour of the major components of operating systems. To describe the basic organization of computer systems. 1.1 We begin our discussion by looking at the operating system's role in the overall computer system. A computer system can be divided roughly into 3 4 Chapter 1 compiler assembler text editor database system operating system Figure 1.1 Abstract view of the components of a computer system. four components: the hardware/ the operating system, the application programs/ and the users (Figure 1.1). The hardwa.te-the the and the 0) { I* parent process *I wait(NULL); printf("PARENT: value= %d",value); I* LINE A *I return 0; } } Figure 3.30 What output will be at Line A? process. Because the parent and child processes have their own copies of the dataf it will be necessary for the child to output the sequence. Have the parent invoke the wait () call to wait for the child process to complete before exiting the program. Perform necessary error checking to ensure that a non-negative number is passed on the command line. 3.14 Repeat the preceding exercisef this time using the CreateProcess () function in the Win32 API. In this instancef you will need to specify a separate program to be invoked from CreateProcess (). It is this separate program that will run as a child process outputting the Fibonacci sequence. Perform necessary error checking to ensure that a non-negative number is passed on the command line. 3.15 Modify the date server shown in Figure 3.19 so that it delivers random jokes rather than the current date. Allow the jokes to contain multiple lines. The date client shown in Figure 3.20 can be used to read the multi-line jokes returned by the joke server. 3.16 An echo server echoes back whatever it receives from a client. For examplef if a client sends the server the string Hello there! the server will respond with the exact data it received from the client-that isf Hello there! 145 Write an echo server using the Java networking API described in Section 3.6.1. This server will wait for a client connection using the accept () method. When a client connection is received, the server will loop, perfonning the following steps: Read data from the socket into a buffer. Write the contents of the buffer back to the client. The server will break out of the loop only when it has determined that the client has closed the connection. The server shown in Figure 3.19 uses the java. io. BufferedReader class. BufferedReader extends the java. io. Reader class, which is used for reading character streams. However, the echo server cannot guarantee that it will read characters from clients; it may receive binary data as well. The class java. io. Input Stream deals with data at the byte level rather than the character level. Thus, this echo server must use an object that extends java. io. InputStrearn. The read() method in the java. io. InputStrearn class returns -1 when the client has closed its end of the socket connection. 3.17 In Exercise 3.13, the child process must output the Fibonacci sequence, since the parent and child have their own copies of the data. Another approach to designing this program is to establish a shared-memory segment between the parent and child processes. This technique allows the child to write the contents of the Fibonacci sequence to the shared- memory segment and has the parent output the sequence when the child completes. Because the memory is shared, any changes the child makes will be reflected in the parent process as well. This program will be structured using POSIX shared memory as described in Section 3.5.1. The program first requires creating the data structure for the shared-memory segment. This is most easily accomplished using a struct. This data structure will contain two items: (1) a fixed-sized array of size MALSEQUENCE that will hold the Fibonacci values and (2) the size of the sequence the child process is to generate- sequence_size, where sequence_size :::: MALSEQUENCE. These items can be represented in a struct as follows: #define MAX_SEQUENCE 10 typedef struct { long fib_sequence[MAX_SEQUENCE]; int sequence_size; } shared_data; The parent process will progress thmugh the following steps: a. Accept the parameter passed on the command line and perform error checking to ensure that the parameter is :::: MAX_SEQUENCE. b. Create a shared-memory segment of size shared_data. c. Attach the shared-memory segment to its address space. 146 Chapter 3 d. Set the value of sequence_size to the parameter on the command line. e. Fork the child process and invoke the wait() systen1. call to wait for the child to finish. f. Output the value of the Fibonacci sequence in the shared-memory segment. g. Detach and remove the shared-memory segment. Because the child process is a copy of the parent, the shared-memory region will be attached to the child's address space as well as the parent's. The child process will then write the Fibonacci sequence to shared memory and finally will detach the segment. One issue of concern with cooperating processes involves synchro- nization issues. In this exercise, the parent and child processes must be synchronized so that the parent does not output the Fibonacci sequence until the child finishes generating the sequence. These two processes will be synchronized using the wait () system call; the parent process will invoke wait (), which will cause it to be suspended until the child process exits. 3.18 Design a program using ordinary pipes in which one process sends a string message to a second process, and the second process reverses the case of each character in the message and sends it back to the first process. For example, if the first process sends the message Hi There, the second process will return hi tHERE. This will require using two pipes, one for sending the original message from the first to the second process, and the other for sending the modified message from the second back to the first process. You may write this program using either UNIX or Windows pipes. 3.19 Design a file-copying program named FileCopy using ordinary pipes. This program will be passed two parameters: the first is the name of the file to be copied, and the second is the name of the copied file. The program will then create an ordinary pipe and write the contents of the file to be copied to the pipe. The child process will read this file from the pipe and write it to the destination file. For example, if we invoke the program as follows: FileCopy input.txt copy.txt the file input. txt will be written to the pipe. The child process will read the contents of this file and write it to the destination file copy. txt. You may write this program using either UNIX or Windows pipes. 3.20 Most UNIX and Linux systems provide the ipcs command. This com- mand lists the status of various POSIX interprocess communication mechanisms, including shared-memory segments. Much of the informa- tion for the command comes from the data structure struct shmid_ds, 147 which is available in the /usr/include/sys/shm.h file. Some of the fields in this structure include: int shm_segsz-size of the shared-memory segment short shm__nattch-number of attaches to the shared-memory segment struct ipc_perm shm_perm-permission structure of the shared- memory segment The struct ipc_perm data structure (which is available in the file /usr/include/sys/ipc.h) contains the fields: unsigned short uid -identifier of the user of the shared-memory segment unsigned short mode-permission modes key_t key (on Linux systems, __key)-user-specified key identifier The permission modes are set according to how the shared-memory segment is established with the shmget () system call. Permissions are identified according to the following: Write permission of owner. 0040 Read permission of group. 0020 Write permission of group. 0004 Read permission of. world. 0002 Write permissionof world. Permissions can be accessed by using the bitwise AND operator &. For example, if the statement mode & 0400 evaluates to "true," the permission mode gives read permission to the owner of the shared- memory segment. A shared-memory segment can be identified according to a user- specified key or according to the integer value returned from the shmget () system call, which represents the integer identifier of the shared-memory segment created. The shm_ds structure for a given integer segment identifier can be obtained with the following shmctl () system call: I* identifier of the shared memory segment*/ int segment_id; shm_ds shmbuffer; shmctl(segment_id, IPC_STAT, &shmbuffer); 148 Chapter 3 If successful, shmctl () returns 0; otherwise, it returns -1 indicating an error condition (the global variable errno can be accessed to determine the error condition). Write a C program that is passed an identifier for a shared-memory segment. This program will invoke the shmctl () function to obtain its shm_ds structure. It will then output the following values of the shared-memory segment: SegmentiD Key Mode Owner DID Size Number of attaches 3.21 POSIX Message Passing. This project consists of using POSIX message queues for communicating temperatures between each of four external processes and a central process. The project can be completed on systems that support POSIX message passing, such as UNIX, Linux, and Mac OS X. Part 1: Overview Four external processes will communicate temperatures to a central process, which in turn will reply with its own temperature and will indicate whether the entire system has stabilized. Each process will receive its initial temperature upon creation and will recalculate a new temperature according to two formulas: new external temp = (myTemp * 3 + 2 * centralTemp) I 5; new central temp = (2 * centralTemp +four temps received from external processes) I 6; Initially, each external process will send its temperature to the mailbox for the central process. If all four temperatures are exactly the same as those sent by the four processes during the last iteration, the system has stabilized. In this case, the central process will notify each external process that it is now finished (along with the central process itself), and each process will output the final stabilized temperature. If the system has not yet become stable, the central process will send its new temperature to the mailbox for each of the outer processes and await their replies. The processes will continue to run until the temperature has stabilized. 149 Part 2: The Message Passing System Processes can exchange messages by using four system calls: msgget (), msgsnd (), msgrcv (), and msgctl (). The msgget () function converts a mailbox name to a message queue id, msqid. (A mailbox name is an externally known message queue name that is shared among the cooperating processes.) msqid, the internal identifier returned by msgget (), must be passed to all subsequent system calls using this message queue to facilitate interprocess communication. A typical invocation of msgget ()is seen below: msqid = msgget(1234, 0600 I IPC_CREAT); The first parameter is the name of the mailbox, and the second parameter instructs the operating system to create the message queue if it does not already exist, with read and write privileges only for processes with the same user id as this process. If a message queue already exists for this mailbox name, msgget () returns the msqid of the existing mailbox. To avoid attaching to an existing message queue, a process can first attempt to attach to the mailbox by omitting IPC_CREAT and then checking the return value from msgget (). If msq id is negative, an error has occurred during the system calt and the globally accessible variable errno can be consulted to determine whether the error occurred because the message queue already exists or for some other reason. If the process determines that the mailbox does not currently exist it can then create it by including IPC_CREAT. (For the current project, this strategy should not be necessary if students are using standalone PCs or are assigned unique ranges of mailbox names by the instructor.) Once a valid msqid has been established, a process can begin to use msgsnd () to send messages and msgrcv () to receive messages. The messages being sent and received are similar in format to those described in Section 3.5.2, as they include a fixed-length portion at the beginning followed by a variable-length portion. Obviously, the senders and receivers must agree on the format of the messages being exchanged. Since the operating system specifies one field in the fixed-length portion of every message format and at least one piece of information will be sent to the receiving process, it is logical to create a data aggregate for each type of message using a struct. The first field of any such struct must be a long, and it will contain the priority of the message. (This project does not use this functionality; we recommend that you simply make the first field in every message equal to the same integral value, such as 2.) Other fields in the messages contain the information to be shared between the communicating processes. Three additional fields are recommended: (1) the temperature being sent, (2) the process number of the external process sending the message (0 for the central process), and (3) a flag that is set to 0 but that the central process will set to 1 when it notices stability. A recommended struct appears as follows: 150 Chapter 3 struct { long priority; int temp; int pid; int stable; } msgp; Assuming the msqid has been established, examples of msgsnd() and msgrcv () appear as such: int stat, msqid; stat = msgsnd(msqid, &msgp, sizeof(msgp)-sizeof(long), 0); stat msgrcv(msqid, &msgp, sizeof(msgp)-sizeof(long), 2, 0); The first parameter in both system calls must be a valid msq id; otherwise a negative value is returned. (Both functions return the number of bytes sent or received upon successful completion of the operation.) The second parameter is the address of where to find or store the message to be sent or received, followed by the number of information bytes to be sent or received. The final parameter of 0 indicates that the operations will be synchronous and that the sender will block if the message queue is full. (IPC_NOWAIT would be used if asynchronous, or nonblocking, operations were desired. Each individual message queue can hold a maximum number of messages-or bytes-so it is possible for the queue to become filled, which is one reason a sender may block when attempting to transmit a message.) The 2 that appears before this final parameter in msgrcv () indicates the minimum priority level of the messages the process wishes to receive; the receiver will wait until a message of that priority (or higher) is sent to the msqid if this is a synchronous operation. Once a process is finished using a message queue, it must be removed so that the mailbox can be reused by other processes. Unless it is removed, the message queue-and any messages that have not yet been received-will remain in the storage space that has been provided for this mailbox by the kernel. To remove the message queue, and delete any unread messages therein, it is necessary to invoke msgctl (), as follows: struct msgid_ds dummyParam; status= msgctl(msqid, IPC_RMID, &dummyParam); The third parameter is necessary because this function requires it but it is used only if it the programmer wishes to collect some statistics about the usage of the message queue. This is accomplished by substituting IPC_STAT as the second parameter. All programs should include the following three header files, which are found in /usr/include/sys: ipc.h, types.h, and msg.h. One possibly confusing artifact of the message queue implementation bears 151 mentioning at this point. After a mailbox is removed via msgctl (),any subsequent attempts to create another mailbox with that same name using msgget () will typically generate a different msqid. Part 3: Creating the Processes Each external process, as well as the central server, will create its own mailbox with the name X+ i, where i is a numeric identifier of the external process 1..4 or zero for the central process. Thus, if X were 70, then the central process would receive messages in the mailbox named 70, and it would send its replies to mailboxes 71-74. Outer process 2 would receive in mailbox 72 and would send to mailbox 70, and so forth. Thus, each external process will attach to two mailboxes, and the central process will attach to five. If each process specifies IPC_CREAT when invoking msgget (), the first process that invokes msgget () actually creates the mailbox; subsequent calls to msgget () attach to the existing mailbox. The protocol for removal should be that the mailbox/message queue that each process is listening to should be the only one it removes -via msgctl ().) Each external process will be uniquely identified by a command-line parameter. The first parameter to each external process will be its initial temperature, and the second parameter will be its unique number: 1, 2, 3, or 4. The central server will be passed one parameter-its initial temperature. Assuming the executable name of the external process is external and the central server is central, invoking all five processes will be done as follows:./external 100 1 &./external 22 2 &./external 50 3 &./external 40 4 &./central 60 & Part 4: Implementation Hints It might be best to start by sending one message successfully from the central process to a single outer process, and vice versa, before trying to write all the code to solve this problem. It is also wise to check all the return values from the four message queue system calls for possible failed requests and to output a message to the screen after each one successfully completes. The message should indicate what was accomplished and by whom -for instance, "mailbox 71 has been created by outer process 1/' "message received by central process from external process 2/' and so forth. These messages can be removed or commented out after the problem is solved. Processes should also verify that they were passed the correct number of command-line parameters (via the argc parameter in main()). Finally, extraneous messages residing in a queue can cause a collection of cooperating processes that function correctly to appear erroneous. For that reason, it is wise to remove all mailboxes relevant to this project to ensure that mailboxes are empty before the processes begin. The easiest way to do this is to use the 152 Chapter 3 ipcs command to list all message queues and the ipcrm command to remove existing message queues. The ipcs command lists the msqid of all message queues on the system. Use ipcrm to remove message queues according to their msqid. For example, if msqid 163845 appears with the output of ipcs, it can be deleted with the following command: ipcrm -q 163845 Interprocess communication in the RC 4000 system is discussed by Brinch- Hansen. Schlichting and Schneider discuss asynchronous message-passing prirnitives. The IPC facility implemented at the user level is described by Bershad et al.. Details of interprocess communication in UNIX systems are presented by Gray. Barrera and Vahalia describe interprocess commu- nication in the Mach system. Russinovich and Solomon , Solomon and Russinovich , and Stevens outline interprocess communication in Windows 2003, Windows 2000 and UNIX respectively. Hart covers Windows systems programming in detail. The implementation of RPCs is discussed by Birrell and Nelson. Shrivastava and Panzieri describes the design of a reliable RPC mecha- nism, and Tay and Ananda presents a survey of RPCs. Stankovic and Stmmstrup discuss procedure calls versus message-passing com- munication. Harold provides coverage of socket programming in Java. Hart and Robbins and Robbins cover pipes in Windows and UNIX systems, respectively. CHAPTER The process model introduced in Chapter 3 assumed that a process was an executing program with a single thread of control. Most modern operating systems now provide features enabling a process to contain multiple threads of control. This chapter introduces many concepts associated with multithreaded computer systems, including a discussion of the APis for the Pthreads, Win32, and Java thread libraries. We look at many issues related to multithreaded programming and its effect on the design of operating systems. Finally, we explore how the Windows XP and Linux operating systems support threads at the kernel level. To introduce the notion of a thread- a fundamental unit of CPU utilization that forms the basis of multithreaded computer systems. To discuss the APis for the Pthreads, Win32, and Java thread libraries. To examine issues related to multithreaded programming. 4.1 A thread is a basic unit of CPU utilization; it comprises a thread ID, a program counter, a register set, and a stack. It shares with other threads belonging to the same process its code section, data section, and other operating-system resources, such as open files and signals. A traditional (or heavrvveighl:) process has a single thread of control. If a process has multiple threads of control, it can perform more than one task at a time. Figure 4.1 illustrates the difference between a traditional process and a process. 4.1.1 Motivation Many software packages that run on modern desktop PCs are multithreaded. An application typically is implemented as a separate process with several threads of control. A Web browser might have one thread display images or 153 154 Chapter 4 thread--+ single-threaded process multithreaded process Figure 4.1 Single-threaded and multithreaded processes. text while another thread retrieves data from the network, for example. A word processor may have a thread for displaying graphics, another thread for responding to keystrokes from the user, and a third thread for performing spelling and grammar checking in the background. In certain situations, a single application may be required to perform several similar tasks. For example, a Web server accepts client requests for Web pages, images, sound, and so forth. A busy Web server may have several (perhaps thousands of) clients concurrently accessing it. If the Web server ran as a traditional single-tlu·eaded process, it would be able to service only one client at a time, artd a client might have to wait a very long time for its request to be serviced. One solution is to have the server run as a single process that accepts requests. When the server receives a request, it creates a separate process to service that request. In fact, this process-creation method was in common use before threads became popular. Process creation is time consuming and resource intensive, however. If the new process will perform the same tasks as the existing process, why incur all that overhead? It is generally more efficient to use one process that contains multiple threads. If the Web-server process is multithreaded, the server will create a separate thread that listens for client requests. When a request is made, rather than creating another process, the server will create a new thread to service the request and resume listening for additional requests. This is illustrated in Figure 4.2. Threads also play a vital role in remote procedure call (RPC) systems. Recall from Chapter 3 that RPCs allow interprocess communication by providing a communication mechanism similar to ordinary function or procedure calls. Typically, RPC servers are multithreaded. When a server receives a message, it services the message using a separate thread. This allows the server to service several concurrent requests. Finally, most operating system kernels are now multithreaded; several threads operate in the kernel, and each thread performs a specific task, such 4.1 155 (2) create new (1) request thread to service the request client server 1------- 1 thread '------.--r---10 (3) resume listening for additional client requests Figure 4.2 Multithreaded server architecture. as managing devices or interrupt handling. For examplef Solaris creates a set of threads in the kernel specifically for interrupt handling; Linux uses a kernel thread for managing the amount of free memory in the system. 4.1.2 Benefits The benefits of multithreaded programming can be broken down into four major categories: Responsiveness. Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user. For instancef a multithreaded Web browser could allow user interaction in one thread while an image was being loaded in another thread. Resource sharing. Processes may only share resources through tech- niques such as shared memory or message passing. Such techniques must be explicitly arranged by the programmer. However, threads share the memory and the resources of the process to which they belong by default. The benefit of sharing code and data is that it allows an application to have several different threads of activity within the same address space. 3. Economy. Allocating memory and resources for process creation is costly. Because threads share the resources of the process to which they belong, it is more economical to create and context-switch threads. Empirically gauging the difference in overhead can be difficult, but in general it is much more time consuming to create and manage processes than threads. In Solarisf for example, creating a process is about thirty times slower than is creating a thread, and context switching is about five times slower. Scalability. The benefits of multithreading can be greatly increased in a multiprocessor architecture, where threads may be running in parallel on different processors. A single-threaded process can only run on one processor, regardless how many are available. Multithreading on a multi- CPU machine increases parallelism. We explore this issue further in the following section. 156 Chapter 4 time Figure 4.3 Concurrent execution on a single-core system. 4.1.3 Multicore Programming A recent trend in system design has been to place multiple computing cores on a single chip, where each core appears as a separate processor to the operating system (Section 1.3.2). Multithreaded programming provides a mechanism for more efficient use of multiple cores and improved concurrency. Consider an application with four threads. On a system with a single computing core, concurrency merely means that the execution of the threads will be interleaved over time (Figure 4.3), as the processing core is capable of executing only one thread at a time. On a system with multiple cores, however, concurrency means that the threads can run in parallel, as the system can assign a separate thread to each core (Figure 4.4). The trend towards multicore systems has placed pressure on system designers as well as application programmers to make better use of the multiple computing cores. Designers of operating systems must write scheduling algorithms that use multiple processing cores to allow the parallel execution shown in Figure 4.4. For application programmers, the challenge is to modify existing programs as well as design new programs that are multithreaded to take advantage of multicore systems. In general, five areas present challenges in programming for multicore systems: Dividing activities. This involves examining applications to find areas that can be divided into separate, concurrent tasks and thus can run in parallel on individual cores. Balance. While identifying tasks that can run in parallel, programmers must also ensure that the tasks perform equal work of equal value. In some instances, a certain task may not contribute as much value to the overall process as other tasks; using a separate execution core to run that task may not be worth the cost. Data splitting. Just as applications are divided into separate tasks, the data accessed and manipulated by the tasks must be divided to run on separate cores. core 1 l 9.5 C1l 9.0 2 3 4 5 6 7 time quantum Figure 5.5 How turnaround time varies with the time quantum. Turnaround time also depends on the size of the time quantum. As we can see from Figure 5.5, the average turnaround time of a set of processes does not necessarily improve as the time-quantum size increases. In general, the average turnaround time can be improved if most processes finish their next CPU burst in a single time quantum. For example, given three processes of 10 time units each and a quantum of 1 time unit, the average turnaround time is 29. If the time quantum is 10, however, the average turnaround time drops to 20. If context-switch time is added in, the average turnaround time increases even more for a smaller time quantum, since more context switches are required. Although the time quantum should be large compared with the context- switch time, it should not be too large. If the time quantum is too large, RR scheduling degenerates to an FCFS policy. A rule of thumb is that 80 percent of the CPU bursts should be shorter than the time quantum. 5.3.5 Multilevel Queue Scheduling Another class of scheduling algorithms has been created for situations in which processes are easily classified into different groups. For example, a common division is made between foreground (interactive) processes and background (batch) processes. These two types of processes have different response-time requirements and so may have different scheduling needs. In addition, foreground processes may have priority (externally defined) over background processes. A multilevel queue scheduling algorithm partitions the ready queue into several separate queues (Figure 5.6). The processes are permanently assigned to one queue, generally based on some property of the process, such as memory 5.3 197 highest priority ====~'-------'i-'-n_te_r~ac_t_iv_e_e...:.d_it~in_g'-'-p~r-.o'-c_·e'---ss~e-s"--------"--'-'---l====i> ======~'---------'b_a_tc_h_p_r_o_ce_s_s_e_s_______J======~> ======·~'-------s_tu_d_e_n_t_p_ro_c_e_s_s_es_ _ _ _ ___jl======i> lowest priority Figure 5.6 Multilevel queue scheduling. size, process priority, or process type. Each queue has its own scheduling algorithm. For example, separate queues might be used for foreground and background processes. The foreground queue might be scheduled by an RR algorithm, while the background queue is scheduled by an FCFS algorithm. In addition, there must be scheduling among the queues, which is com- monly implemented as fixed-priority preemptive scheduling. For example, the foreground queue may have absolute priority over the background queue. Let's look at an example of a multilevel queue scheduling algorithm with five queues, listed below in order of priority: System processes Interactive processes Interactive editing processes Batch processes Student processes Each queue has absolute priority over lower-priority queues. No process in the batch queue, for example, could run unless the queues for system processes, interactive processes, and interactive editing processes were all empty. If an interactive editing process entered the ready queue while a batch process was running, the batch process would be preempted. Another possibility is to time-slice among the queues. Here, each queue gets a certain portion of the CPU time, which it can then schedule among its various processes. For instance, in the foreground-background queue example, the foreground queue can be given 80 percent of the CPU time for RR scheduling among its processes, whereas the background queue receives 20 percent of the CPU to give to its processes on an FCFS basis. 198 Chapter 5 5.3.6 Multilevel Feedback Queue Scheduling Normally, when the multilevel queue scheduling algorithm is used, processes are permanently assigned to a queue when they enter the system. If there are separate queues for foreground and background processes, for example, processes do not move from one queue to the other, since processes do not change their foreground or background nature. This setup has the advantage of low scheduling overhead, but it is inflexible. The multilevel feedback queue scheduling algorithm, in contrast, allows a process to move between queues. The idea is to separate processes according to the characteristics of their CPU bursts. If a process uses too much CPU time, it will be moved to a lower-priority queue. This scheme leaves I/O-bound and interactive processes in the higher-priority queues. In addition, a process that waits too long in a lower-priority queue may be moved to a higher-priority queue. This form of aging prevents starvation. For example, consider a multilevel feedback queue scheduler with three queues, numbered from 0 to 2 (Figure 5.7). The scheduler first executes all processes in queue 0. Only when queue 0 is empty will it execute processes in queue 1. Similarly, processes in queue 2 will only be executed if queues 0 and 1 are empty. A process that arrives for queue 1 will preempt a process in queue 2. A process in queue 1 will in turn be preempted by a process arriving for queue 0. A process entering the ready queue is put in queue 0. A process in queue 0 is given a time quantum of 8 milliseconds. If it does not filcish within this time, it is moved to the tail of queue 1. If queue 0 is empty, the process at the head of queue 1 is given a quantum of 16 milliseconds. If it does not complete, it is preempted and is put into queue 2. Processes in queue 2 are run on an FCFS basis but are run only when queues 0 and 1 are empty. This scheduling algorithm gives highest priority to any process with a CPU burst of 8 milliseconds or less. Such a process will quickly get the CPU, finish its CPU burst, and go off to its next I/0 burst. Processes that need more than 8 but less than 24 milliseconds are also served quickly, although with lower priority than shorter processes. Long processes automatically sink to queue 2 and are served in FCFS order with any CPU cycles left over from queues 0 and 1. Figure 5.7 Multilevel feedback queues. 5.4 199 In general, a multilevel feedback queue scheduler is defined by the following parameters: The number of queues The scheduling algorithm for each queue The method used to determine when to upgrade a process to a higher- priority queue The method used to determine when to demote a process to a lower- priority queue The method used to determine which queue a process will enter when that process needs service The definition of a multilevel feedback queue scheduler makes it the most general CPU-scheduling algorithm. It can be configured to match a specific system under design. Unfortunately, it is also the most complex algorithm, since defining the best scheduler requires some means by which to select values for all the parameters. 5.4 In Chapter 4, we introduced threads to the process model, distinguishing between user-level and kernel-level threads. On operating systems that support them, it is kernel-level threads-not processes-that are being scheduled by the operating system. User-level threads are managed by a thread library, and the kernel is unaware of them. To run on a CPU, user-level threads must ultimately be mapped to an associated kernel-level thread, although this mapping may be indirect and may use a lightweight process (LWP). In this section, we explore scheduling issues involving user-level and kernel-level threads and offer specific examples of scheduling for Pthreads. 5.4.1 Contention Scope One distinction between user-level and kernel-level threads lies in how they are scheduled. On systems implementing the many-to-one (Section 4.2.1) and many-to-many (Section 4.2.3) models, the thread library schedules user-level threads to run on an available LWP, a scheme known as process-contention scope (PCS), since competition for the CPU takes place among threads belonging to the same process. When we say the thread library schedules user threads onto available LWPs, we do not mean that the thread is actually running on a CPU; this would require the operating system to schedule the kernel thread onto a physical CPU. To decide which kernel thread to schedule onto a CPU, the kernel uses system-contention scope (SCS). Competition for the CPU with SCS scheduling takes place among all threads in the system. Systems usilcg the one-to-one model (Section 4.2.2), such as Windows XP, Solaris, and Linux, schedule threads using only SCS. Typically, PCS is done according to priority-the scheduler selects the runnable thread with the highest priority to run. User-level thread priorities 200 Chapter 5 are set by the programmer and are not adjusted by the thread library, although some thread libraries may allow the programmer to change the priority of a thread. It is important to note that PCS will typically preempt the thread currently running in favor of a higher-priority thread; however, there is no guarantee of time slicing (Section 5.3.4) among threads of equal priority. 5.4.2 Pthread Scheduling We provided a sample POSTX Pthread program in Section 4.3.1, along with an introduction to thread creation with Pthreads. Now, we highlight the POSIX Pthread API that allows specifying either PCS or SCS during thread creation. Pthreads identifies the following contention scope values: PTHREAD_SCOPE_PROCESS schedules threads using PCS scheduling. PTHREAD_SCOPE_SYSTEM schedules threads using SCS scheduling. On systems implementing the many-to-many model, the PTHREAD_SCOPE_PROCESS policy schedules user-level threads onto available LWPs. The number of LWPs is maintained by the thread library, perhaps using scheduler activations (Section 4.4.6). The PTHREAD_SCOPE_SYSTEM scheduling policy will create and bind an LWP for each user-level thread on many-to-many systems, effectively mapping threads using the one-to-one policy. The Pthread IPC provides two functions for getting-and setting-the contention scope policy: pthread_attr_setscope(pthread_attr_t *attr, int scope) pthread_attr_getscope(pthread_attr_t *attr, int *scope) The first parameter for both functions contains a pointer to the attribute set for the thread. The second parameter for the pthread_attr_setscope () function is passed either the PTHREAD_SCOPE_SYSTEM or the PTHREAD_SCOPE_PROCESS value, indicating how the contention scope is to be set. In the case of pthread_attr_getscope (), this second parameter contaiilS a pointer to an int value that is set to the current value of the contention scope. If an error occurs, each of these functions returns a non-zero value. In Figure 5.8, we illustrate a Pthread scheduling API. The pro- gram first determines the existing contention scope and sets it to PTHREAD_SCOPLPROCESS. It then creates five separate threads that will run using the SCS scheduling policy. Note that on some systems, only certain contention scope values are allowed. For example, Linux and Mac OS X systems allow only PTHREAD_SCOPE_SYSTEM. 5.5 Our discussion thus far has focused on the problems of scheduling the CPU in a system with a single processor. If multiple CPUs are available, load sharing becomes possible; however, the scheduling problem becomes correspondingly 505 201 #include #include #define NUM_THREADS 5 int main(int argc, char *argv[]) { int i, scope; pthread_t tid[NUM_THREADS]; pthread_attr_t attr; I* get the default attributes *I pthread_attr_init(&attr); I* first inquire on the current scope *I if (pthread_attr_getscope(&attr, &scope) != 0) fprintf(stderr, "Unable to get scheduling scope\n"); else { if (scope == PTHREAD_SCOPE_PROCESS) printf("PTHREAD_SCOPLPROCESS"); else if (scope == PTHREAD_SCOPE_SYSTEM) printf("PTHREAD_SCOPE_SYSTEM"); else fprintf(stderr, "Illegal scope valueo\n"); } I* set the scheduling algorithm to PCS or SCS *I pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); I* create the threads *I for (i = 0; i < NUM_THREADS; i++) pthread_create(&tid[i] ,&attr,runner,NULL); I* now join on each thread *I for (i = 0; i < NUM_THREADS; i++) pthread_join(tid[i], NULL); } I* Each thread will begin control in this function *I void *runner(void *param) { I* do some work 0 0 0 *I pthread_exi t ( 0) ; } Figure 508 Pthread scheduling API. more complex. Many possibilities have been tried; and as we saw with single- processor CPU scheduling, there is no one best solution. Here, we discuss several concerns in multiprocessor scheduling. We concentrate on systems 202 Chapter 5 in which the processors are identical-homogeneous-in terms of their functionality; we can then use any available processor to run any process in the queue. (Note, however, that even with homogeneous multiprocessors, there are sometimes limitations on scheduling. Consider a system with an l/0 device attached to a private bus of one processor. Processes that wish to use that device must be scheduled to run on that processor.) 5.5.1 Approaches to Multiple-Processor Scheduling One approach to CPU scheduling in a n1.ultiprocessor system has all scheduling decisions, I/O processing, and other system activities handled by a single processor-the master server. The other processors execute only user code. This asymmetric multiprocessing is simple because only one processor accesses the system data structures, reducing the need for data sharing. A second approach uses symmetric multiprocessing (SMP), where each processor is self-scheduling. All processes may be in a common ready queue, or each processor may have its own private queue of ready processes. Regardless, scheduling proceeds by having the scheduler for each processor examine the ready queue and select a process to execute. As we shall see in Chapter 6 1 if we have multiple processors trying to access and update a common data structure, the scheduler must be programmed carefully. We must ensure that two processors do not choose the same process and that processes are not lost from the queue. Virtually all modern operating systems support SMP, including Windows XP, Windows 2000, Solaris, Linux, and Mac OS X. In the remainder of this section, we discuss issues concerning SMP systems. 5.5.2 Processor Affinity Consider what happens to cache memory when a process has been running on a specific processor. The data most recently accessed by the process populate the cache for the processor; and as a result, successive memory accesses by the process are often satisfied in cache memory. Now consider what happens if the process migrates to another processor. The contents of cache memory must be invalidated for the first processor, and the cache for the second processor must be repopulated. Because of the high cost of invalidating and repopulating caches, most SMP systems try to avoid migration of processes from one processor to another and instead attempt to keep a process rumung on the same processor. This is known as processor affinity-that is, a process has an affinity for the processor on which it is currently rumting. Processor affinity takes several forms. When an operating system has a policy of attempting to keep a process running on the same processor-but not guaranteeing that it will do so-we have a situation known as soft affinity. Here, it is possible for a process to migrate between processors. Some systems -such as Lim.IX -also provide system calls that support hard affinity, thereby allowing a process to specify that it is not to migrate to other processors. Solaris allows processes to be assigned to limiting which processes can run on which CPUs. It also implements soft affinity. The main-memory architecture of a system can affect processor affinity issues. Figure 5.9 illustrates an architecture featuring non-uniform memory access (NUMA), in which a CPU has faster access to some parts of main memory than to other parts. Typically, this occurs in systems containing combined CPU 5.5 203 computer Figure 5.9 NUMA and CPU scheduling. and memory boards. The CPUs on a board can access the memory on that board with less delay than they can access memory on other boards in the system. If the operating system's CPU scheduler and memory-placement algorithms work together, then a process that is assigned affinity to a particular CPU can be allocated memory on the board where that CPU resides. This example also shows that operating systems are frequently not as cleanly defined and implemented as described in operating-system textbooks. Rather, the "solid lines" between sections of an operating system are frequently only "dotted lines," with algorithms creating connections in ways aimed at optimizing performance and reliability. 5.5.3 Load Balancing On SMP systems, it is important to keep the workload balanced among all processors to fully utilize the benefits of having more than one processor. Otherwise, one or more processors may sit idle while other processors have high workloads, along with lists of processes awaiting the CPU. Load balancing attempts to keep the workload evenly distributed across all processors in an SMP system. It is important to note that load balancing is typically only necessary on systems where each processor has its own private queue of eligible processes to execute. On systems with a common run queue, load balancing is often unnecessary, because once a processor becomes idle, it immediately extracts a rmmable process from the common run queue. It is also important to note, howeve1~ that in most contemporary operating systems supporting SMP, each processor does have a private queue of eligible processes. There are two general approaches to load balancing: push migration and pull migration. With push migration, a specific task periodically checks the load on each processor and -if it finds an imbalance-evenly distributes the load by moving (or pushing) processes from overloaded to idle or less-busy processors. Pull migration occurs when an idle processor pulls a waiting task from a busy processor. Push and pull migration need not be mutually exclusive and are in fact often implemented in parallel on load-balancing systems. For example, the Linux scheduler (described in Section 5.6.3) and the ULE scheduler 204 Chapter 5 available for FreeBSD systems implement both techniqL1es. Linux runs its load- balancing algorithm every 200 milliseconds (push migration) or whenever the run queue for a processor is empty (pull migration). Interestingly, load balancing often counteracts the benefits of processor affinity, discussed in Section 5.5.2. That is, the benefit of keeping a process running on the same processor is that the process can take advantage of its data being in that processor's cache memory. Either pulling or pushing a process from one processor to another invalidates this benefit. As is often the case in systems engineering, there is no absolute rule concerning what policy is best. Thus, in some systems, an idle processor always pulls a process from a non-idle processor; and in other systems, processes are moved only if the imbalance exceeds a certain threshold. 5.5.4 Multicore Processors Traditionally, SMP systems have allowed several threads to run concurrently by providing multiple physical processors. However, a recent trend in computer hardware has been to place multiple processor cores on the same physical chip, resulting in a. Each core has a register set to maintain its architectural state and appears to the operating system to be a separate physical processor. SMP systems that use multicore processors are faster and consume less power than systems in which each processor has its own physical chip. Multicore processors may complicate scheduling issues. Let's consider how this can happen. Researchers have discovered that when a processor accesses memory, it spends a significant amount of time waiting for the data to become available. This situation, known as a may occur for various reasons, such as a cache miss (accessing data that is not in cache memory). Figure 5.10 illustrates a memory stall. In this scenario, the processor can spend up to 50 percent of its time waiting for data to become available from memory. To remedy this situation, many recent hardware designs have implemented multithreaded processor cores in which two (or more) hardware threads are assigned to each core. That way, if one thread stalls while waiting for memory, the core can switch to another thread. Figure 5.11 illustrates a dual-threaded processor core on which the execution of thread 0 and the execution of thread 1 are interleaved. From an operating-system perspective, each hardware thread appears as a logical processor that is available to run a software thread. Thus, on a dual-threaded, dual-core system, four logical processors are presented to the operating system. The UltraSPARC Tl CPU has eight cores per chip and four 0 compute cycle ~memory stall cycle thread c M c M c M c M time Figure 5.10 Memory stall. 5.5 205 thread 1 c M c M c M c thread 0 c M c M c M c time Figure 5.11 Multithreaded multicore system. hardware threads per core; from the perspective of the operating system, there appear to be 32 logical processors. In general, there are two ways to multithread a processor: ~__u.,u."·c-).;u: cHccu multithreading. With coarse-grained multithreading, a thread executes on a processor until a long-latency event such as a memory stall occurs. Because of the delay caused by the long-latency event, the processor must switch to another thread to begin execution. However, the cost of switching between threads is high, as the instruction pipeline must be flushed before the other thread can begin execution on