CMSC 125 LE1 ACSS Module V1.1.pdf

B CMSC 125 PL -U Operating Systems SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Disclaimer This module is made solely for the use of Alliance of Computer Science Students – UPLB developers and should not be shared to those outside the organization. Acknowledgement B This module has been created by the following developers. If you have any comments, concerns, or suggestions regarding the module, please PL direct them to the developers listed below. Alcantara, Henry -U Arellano, Vincent Marin, Sean Joseph SS Proofread by: Tegrado, Kenneth Renz AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Table of Contents Introduction to Operating Systems 4 Abstraction: The Process 9 Interlude: Process API 13 Mechanism: Limited Direct Execution 16 Scheduling: Introduction 21 Scheduling: The Multi-Level Feedback Queue 27 Scheduling: Proportional Share 32 Multiprocessor Scheduling 36 B Abstraction: Address Spaces 39 Interlude: Memory API 41 PL Mechanism: Address Translation 44 Segmentation 50 Free-Space Management 55 Paging: Introduction 58 Paging: Faster Translations (TLBs) 63 -U Paging: Smaller Tables (Advanced Page Tables) 70 Beyond Physical Memory: Mechanisms 73 Beyond Physical Memory: Policies 77 SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 01 Introduction to Operating Systems Introduction An Operating System (OS) is a piece of software that acts like a bridge between software applications and the hardware of a computer. ○ It acts as a mediator that links programs with the actual physical components of the computer. ○ Essentially, the OS simplifies the complex hardware, making it possible B to run software smoothly on a computer. Von Neumann Model of Computing: Running a program involves fetching, decoding, and executing instructions from memory PL The OS is responsible for ○ making it easy to run programs, ○ allowing programs to share memory, and ○ enabling programs to interact with devices. The primary goal of the operating system (OS) is to make sure that the system operates correctly, efficiently, and in an easy-to-use manner. -U This is achieved through: Virtualization OS Virtualizes physical resources like the processor, memory, and disk into a more accessible form to achieve its goals. SS Hence, OS is also referred to as a virtual machine. Exporting System OS Exports system calls for actions like running Calls programs, memory allocation, and file access. AC Hence, OS becomes a standard library. OS as a Resource Manages resources by overseeing CPU, memory, and Manager disk allocation. The role of OS includes managing these resources efficiently and fairly. Three General Themes in Building an OS: Virtualization (CPU and Memory), Concurrency, Persistence Virtualizing the CPU An Illusion: Allowing many programs to seemingly run at once. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester A system can have a single processor (CPU) but can have a very large number of virtual CPUs ○ Turning a single CPU into a seemingly infinite number of CPUs As a resource manager, an OS needs to implement mechanisms and enforce policies ○ Mechanism - operations or methods for implementation (e.g. timer) ○ Policy - determines how competing programs are prioritized (e.g. round-robin, shortest job first) Virtualizing Memory B The physical memory is represented as a simple array of bytes. Each location in the array (slot) has an address. A program keeps all of its data structures in memory which is accessed through PL various instructions ○ Read memory (load instruction) - specify an address to be able to access the data ○ Write memory (store instruction) - specify the data to be written to the given address -U ○ Note: Instruction/code that manipulate data structures are also in memory Each process accesses its own private virtual address space. ○ The OS maps virtual address space onto the physical memory. ○ A memory reference within one running program does not affect the SS address space of other processes. ○ They assume exclusive access to physical memory. ○ Physical memory is a shared resource, managed by the OS AC Concurrency Concurrency refers to dealing with multiple tasks simultaneously within a program. ○ A challenge that arises in both operating systems and multi-threaded programs, that may lead to data inconsistencies and incorrect operations. Example: Suppose that a program uses two threads to increment a shared counter. ○ When running with low input loops like 1000, the final counter value is as expected, 2000. ○ However, with higher loop values like 100,000, unexpected and varying results occur due to non-atomic execution of instructions. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Non-atomic execution - refers to the manner in which instructions do not execute all at once. Persistence Persistence in operating systems refers to the ability to store data reliably even when the power goes out or the system crashes. Problem: Data stored in system memory (e.g., DRAM) is volatile and can be easily lost during power loss or system crashes. Solution: Hence, hardware and software are needed to store data persistently. ○ Hardware: includes input/output (I/O) devices, with hard drives and B solid-state drives (SSDs) commonly used to store long-lived data. ○ Software: The file system is the OS software responsible for managing data storage on disks efficiently and reliably. PL Unlike CPU and memory abstractions, file systems are designed for sharing data across processes, allowing multiple applications to access and modify files. ○ Ex: Your C code is used by both text editor and compiler Writing data to disk involves complex operations, including determining where data is stored on disk, maintaining data structures, and handling I/O requests. -U File systems implement intricate write protocols (e.g., journaling, copy-on-write) to ensure recovery to a reasonable state after failures. File systems use various data structures and access methods (e.g., lists, b-trees) for efficient data management. SS Design Goals Abstraction To make the operating system convenient and easy to use, allowing for the division of complex tasks into understandable components. AC High Performance By minimizing the overheads of the OS, such as extra instructions and memory or disk space. Protection To prevent malicious or accidental harm to processes or the OS itself. Isolation is a key principle, where bad behavior of one does not harm other processes and the OS itself. Reliability To ensure continuous operation to prevent the failure of all applications on the system. Energy-efficiency To align with environmental concerns, optimizing resource usage to reduce power consumption. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Security To protect against malicious applications and data breaches. Mobility Adapt to the increasing importance of mobility, as OSes are used on a variety of devices, by considering different design goals based on device usage. History of Operating Systems Early Operating Initially, operating systems served as libraries providing Systems: Just commonly-used functions to simplify development. Batch Libraries processing was common, where one program ran at a time, B controlled by a human operator. Beyond Libraries: Operating systems evolved to provide protection between PL Protection applications and treated OS code as special, introducing system calls to formalize interactions with hardware and raise privilege levels when necessary. Kernel Mode - privileged level for execution User Mode - restricted / limited execution -U Multiprogramming Minicomputers made computing more affordable, enabling Era multiprogramming to improve CPU utilization. Multiprogramming - loading a number of jobs into memory and rapidly switching between them SS Memory protection and concurrency management became essential due to rapid job switching. Modern Era The personal computer (PC) era began with affordable machines like the IBM PC and Apple II. However, early PC operating systems, like Disk Operating Systems (DOS), lacked AC essential features such as memory protection. During the late 90s and early 200s - has forgotten the lessons during minicomputer era Has improved recently with introduction of Linux and improvements in Windows and Mac OS Summary Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Operating systems serve as a bridge between software applications and computer hardware, with the primary goal of making sure that the system operates correctly, efficiently, and in an easy-to-use manner. Operating systems virtualize CPUs to enable multiple programs to seemingly run simultaneously, implementing mechanisms and policies for resource management. Memory is virtually allocated to processes, with the OS mapping virtual to physical memory, ensuring data isolation and managing shared physical memory. Dealing with multiple tasks simultaneously, concurrency presents challenges like data inconsistencies and incorrect operations, often caused by non-atomic instruction execution. B Persistence involves reliably storing data even in power loss or system crashes, utilizing hardware (I/O devices) and software (file systems) to manage data PL efficiently. OS design goals include providing abstractions, high performance, protection, reliability, energy efficiency, security, and mobility to different devices. Early operating systems were libraries for simplifying development, evolving to -U provide protection between applications. The multiprogramming era improved CPU utilization and introduced memory protection. The modern era brought affordable PCs but initially lacked essential features; improvements came with Linux and advancements in Windows and macOS. SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 01 Abstraction: The Process Introduction Problem: There are many programs running on your computer, but your computer only has a limited number of CPUs to do the work. Solution: The OS creates an illusion by virtualizing the CPU. Achieved through time sharing. o Time sharing - By running one process, then stopping it and running another, and so forth, the OS can promote the illusion that many virtual CPUs exist when in fact there is only few CPUs. B Advantage Disadvantage PL allows users to run as many the potential cost is performance, concurrent processes as they as each will run more slowly if the would like CPU(s) must be shared -U To implement this virtualization, the OS needs mechanisms and policies. Mechanism Policies low-level methods or protocols that algorithms for making some kind of SS implement a functionality decision within the OS Answers: How? Answers: Which? Ex: Context-Switch – mechanism that Ex: Scheduling policy - policy that AC helps switch between programs decides which program to run next. What is a Process? Process – a running program, a major OS abstraction o Characterized by looking at different data structures and system resources it uses or accesses Machine state of a process – involves the critical aspects of the machine required for the program's execution. Parts of a Machine State Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Memory Instructions lie in memory; the data being read and written sits in memory as well Address space – memory that the process can address Registers Many instructions read and update registers. Program Counter (or Instruction Pointer) – tells which instruction of program to be executed next Stack Pointer and Frame Pointer – used to manage stack for function parameters, local variables, and return addresses B I/O Include a list of files that the process currently has open information PL Process API The interface for managing processes, available on any modern OS. -U Consist of calls that a program can make related to processes Create To start new processes when you run fork(), exec(), clone() programs. SS Destroy To forcefully stop processes that aren't kill (pid, SIGTERM) exiting on their own. Wait To wait for a process to finish running. wait(pid) Miscellaneous For other actions like pausing and kill(pid, SIGSTOP), AC Control resuming processes. kill(pid, SIGCONT) Status To get information about a process, like cat /proc/pid/status how long it has been running or its current state. Process Creation Step Action Description 1 Load a program code Programs initially reside on disk in executable (and static data) into format. (ELF, PE) Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester memory, into the OS perform the loading process lazily. - Loading address space of the pieces of code or data only as they are needed process. during program execution. 2 The program’s run-time Use the stack for local variables, function stack is allocated. parameters, and return address. Initialize the stack with arguments → argc and the argv array of main() function 3 The program’s heap is Used for explicitly requested dynamically created (usually done allocated data. lazily). Program request such space by calling malloc() B and free it by calling free(). PL 4 The OS does some other input/output (I/O) setup initialization tasks. Each process by default has three open file descriptors. Standard input (STDIN=0), standard output(STDOUT=1), and standard error (STDERR=2) -U 5 Start the program The OS transfers control of the CPU to the running at the entry newly-created process point, namely main(). SS Process States and Process State Transitions Apps can be in three states: o "Running" (active), AC o "Ready" (waiting for its turn), or o "Blocked" (waiting for something, like reading from a disk). o Additional States, depending on OS: “Initial” and “Final / Terminated” Apps can switch between these states. For example, when a program asks the computer to do something like read a file from a disk, it switches from “Running” to "Blocked" while waiting for the reading to finish. Once the input is read, it transitions to a “Ready” state, and transitions to “Running” once it is scheduled. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Data Structures B The OS keeps track of all the processes using data structures. Process or Tasks Lists Tracks ready, blocked, and currently running processes PL Register Context Stores register values when a process is stopped or paused, and restores them during a context-switch, when it is resumed Process Control Block C-structure that contains information about each (PCB) process -U Summary SS A process represents a running program, described at any point in time by its machine state: address space, contents of CPU registers, and I/O information. The Process API lets programs create, destroy, and manage processes. AC Once a process is created, it: (1) loads program code and data into memory, (2) allocates a runtime stack, (3) creates a heap, (4) performs OS initialization, and (5) starts the program at its entry point. Processes have states like running, ready, and blocked, changing based on events like scheduling or I/O. A process list contains information about all processes and their states. A PCB contains information about a specific process. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 01 Interlude: Process API The fork() System Call Creates a new process (child), that is a copy of the calling process (parent). The child has a private copy of its own address space, registers, and PC; which is different from its parent. Parent Child B Starts as intended, in its main() Starts execution after fork() call fork() returns the child’s PID (process fork returns zero (0) PL identifier) The order of execution of parent and child processes is non-deterministic, dependent on the scheduler -U The wait() System Call The wait() system call allows the parent process to wait for the child process to complete before proceeding. SS It ensures a deterministic order of execution, as it won't return until the child has finished, even if the scheduler initially selects the parent process after a fork(). AC The exec() System Call Allows a process (usually, a child process) to replace its code and data with a different program and run that program Used to change the behavior of a process dynamically Example: Suppose that a child process is created from a fork(), it will have different code, static data, address space, registers, PC ○ exec() loads the code and static data, given the name of the executable into the child process’ address space which originally contains the code and static data of the parent process copied during the call to fork() A successful call to exec() never returns – since the process is already overwritten. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Why separate fork() and exec()? Essential for building a shell – an “interface” program that accepts commands from users to access the services of the OS ○ Ex. BASH and ZSH in Linux, Command Prompt and Powershell in Windows With this separation, the shell can run some code after the call to fork() but before the call to exec() ○ It allows the shell to alter the environment of the about to be run program by exec(), enabling more features (ex. Pipes and I/O redirection) to be implemented B Process Control and Users PL There are other process-related system calls kill() – used to send signals to a process ○ SIGINT (interrupt / terminate) – pressing CTRL+c ○ SIGSTOP (stop / pause) – pressing CTRL+z ○ SIGCONT (continue) – fg command -U Signals subsystem enable the delivery of events to processes ○ Processes and process groups can have signal handlers (via signal() system call) to perform a specific action whenever a specific signal is received Processes are associated to users in order to provide ownership and control, as SS well as limited security and protection ○ getuid() – system call to return the user id of the user of the process calling it ○ Superuser (aka root) - can control all processes AC Useful Programs and Commands: man - for documentation purposes top/htp - displays processes of the system ps/pstree - which processes are currently running kill/killall - send arbitrary signals to processes Summary Each process is identified by a process identifier (PID). Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester The fork() system call creates a new process, with the parent and child resembling each other. The wait() system call lets a parent process wait for its child to finish execution. The exec() family of system calls allows a (child) process to execute a new program, breaking its similarity to the parent. Shells use fork(), wait(), and exec() to run user commands, enabling features like I/O redirection and pipes. Process control is managed through signals, which can pause, resume, or terminate jobs. B Users can only manage their own processes. PL A superuser has control over all processes but should be used cautiously for security reasons. -U SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 02 Mechanism: Limited Direct Execution Overview There are two challenges in building virtualization in a machine and achieving time sharing/multiprogramming: (1) performance and (2) control. The process should be able to run as fast as it could without consuming too many unnecessary resources while having the OS retain control over the CPU. B Basic Technique: Direct Execution The concept is to run a program directly on the CPU to achieve great PL performance while having restricted access to resources. Running a Program (Without Limits) The following steps show how the OS creates a process and runs it directly on the -U CPU: 1. OS creates a process entry in a process list. 2. Allocate memory for it. 3. Load program code (from disk) into memory. SS 4. Locate entry point (e.g. main()) 5. Jump to entry point; and 6. Run user’s code. AC However, although simple, it introduces two problems: 1. How can the process have access to restricted operations? 2. How can the OS switch between processes? Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Problem #1: Restricted Operations To allow a process perform restricted operations, a protected control transfer is introduced through two modes of operations: 1. User Mode - applications/user processes 2. Kernel Mode - allows OS to have full access to machine resources Instead of the process performing the operations, the kernel will do it for them through system calls. Steps Notes First, the program counter, flags B registers, and other important registers are saved by the hardware to return to the process correctly when returning from a trap. PL The kernel performs the operation via System calls allow the kernel to expose system call and executes the trap certain key pieces of functionality to a instruction. process. Inside the system call is a special instruction called a trap instruction. -U The trap instruction jumps to the kernel, To know what operation needs to be raisies privilege level to kernel mode, and done, the trap number leads to a specific lets the system perform the operation. address through the trap table, a mapping data structure to store which SS instruction is to be done by a certain trap number. When finished, the OS calls a special return-from-trap instruction that reduces privilege level to user mode and AC returns the control to the user program previously running. Interrupt Mechanism in Hardware During important events that need the attention of the CPU (e.g. zero division, I/O requests, system call is invoked), polling or interrupts are raised, which are identified by interrupt numbers. Classification of Interrupts Interrupts Hardware events such as I/O completion or when an INT instruction is called. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Exceptions Interrupts when error conditions happen. Exceptions have three types: 1. Traps - a process continues the execution to the next instruction. 2. Faults - process continues execution to the fault handler/function/method (e.g. zero division). 3. Abort - severe errors, hardware failure (e.g. BSOD) Trap Tables Set up by kernel at boot time, trap tables contain the memory address of B functions (trap handler) that are identified by interrupt numbers. These handlers execute when interrupt- and exception-related events occur. To prevent direct memory access, service numbers are used instead of the trap handler address itself. PL Problem #2: Switching Between Processes Since processes are actually not running simultaneously, the OS stops running when a process starts. So how does the OS gain back its control and switch between -U processes? There are two approaches. Approach What It Does Cooperative Approach: Wait for The OS trusts the process to periodically System Calls give up the CPU through system calls SS (e.g. open a file and read it, create a new process, etc.). Even when the process does something illegal (e.g. zero division or accessing AC restricted memory), it transfers control to the OS through trap instructions. Non-Cooperative Approach: The OS When a process does not make system Takes Control calls, the OS uses a mechanism called a timer interrupt: a device that raises an interrupt every n milliseconds and runs an interrupt handler in the OS, forcefully halting the process. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Limited Direct Execution with a Timer Interrupt (Context Switching) Steps During boot up… The OS initializes the trap table. The hardware stores the addresses of the system call handler and the timer handler. OS starts interrupt timer. Hardware starts the timer and interrupts CPU every X ms. B During runtime… The program/process A runs. PL The hardware: 1. Interrupts Process A. 2. Saves the registers of Process A to the kernel stack of Process A. 3. Moves to kernel mode and jumps to trap handler. The OS: -U 1. Handles/performs the trap instruction. 2. Calls the switch() method which: a. Saves the registers of Process A to A’s PCB. b. Restores the registers of Process B from B’s PCB. c. Switch to the kernel stack of Process B. 3. Performs the return-from-trap instruction into Process B. SS The hardware: 1. Restores the registers of Process B from B’s kernel stack. 2. Moves to user mode. 3. Jump to B’s program counter. AC Once the program counter and registers are set, Process B runs, completing the switch. Concurrency Problems What if an interrupt is raised when an interrupt is already being processed? The OS disables interrupts during these times, preventing an interrupt to be delivered to the CPU. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Summary Limited direct execution runs the program on the CPU directly to run the process efficiently while, through the hardware, limiting what it can do. The OS baby proofs the CPU by setting up trap handlers, starting an interrupt timer, and running the processes only in restricted mode. There are two modes of execution to allow for protected control transfer: user mode for restricted usage and kernel mode for a privileged access to resources. User applications use system calls to trap and get the kernel to perform the operations they want to do (if allowed). B Trap instructions saves the state of registers, increases privilege to kernel mode and jumps to the trap table inside the OS and calls the trap handler/function that PL performs the operation. After the system call, the OS returns to the user program by doing a return-from-trap instruction that reduces privilege to user mode and returns control to the process. -U Trap tables are set up by the OS at boot time. Timer interrupts ensure that the process does not run indefinitely and prevent the OS from having control. This is a non-cooperative approach to CPU scheduling. Context switching is a technique that the OS does to switch from one process to SS another during a timer interrupt or a system call. AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 02 Scheduling: Introduction Overview This section explains the different scheduling policies that answer the question What/Which process to run? Workload Assumptions Before introducing the different scheduling policies, we must assume the following assumptions (which are unrealistic) about the jobs or processes that use B the CPU: 1. All jobs only use one CPU and perform no I/O 2. The run-time of each job is known PL Scheduling Metrics There are three metrics to compare the scheduling policies: Turnaround Time 𝑇𝑡𝑢𝑟𝑛𝑎𝑟𝑜𝑢𝑛𝑑 = 𝑇𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑖𝑜𝑛 − 𝑇𝑎𝑟𝑟𝑖𝑣𝑎𝑙 -U The turnaround time of a process is calculated by subtracting the time of arrival from the time of completion. To get the average turnaround time (ATT), divide the summation of the jobs’ turnaround time by the number of jobs. Response Time 𝑇𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 = 𝑇𝑓𝑖𝑟𝑠𝑡𝑅𝑢𝑛 − 𝑇𝑎𝑟𝑟𝑖𝑣𝑎𝑙 SS Subtract the time of arrival from the time of its first run to get the response time of a job. Consequently, the Average Response Time would be the total response time of all jobs divided by the number of jobs. AC Fairness Often has a tradeoff with performance since a scheduler may optimize performance at the cost of preventing a few jobs from running, decreasing fairness. Scheduling Policies Policy 1: First In, First Out (FIFO) Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester The most basic algorithm Also known as First Come, First Served (FCFS) B Often leads to the convoy effect ○ A number of relatively short jobs get queued after a long job PL -U Policy 2: Shortest-Job-First Optimal for turnaround time if all jobs arrive at the same time A non-preemptive scheduling approach SS ○ It runs a process to completion without interruption AC What if A arrived at t = 0 and B and C arrived at t = 10? Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Since it is non-preemptive, shorter jobs that arrive later could still get stuck after a long job. B Policy 3: Shortest Time-to-Completion First (STCF) Adds preemption to SJF, where processes don’t need to run to completion When a new process arrives to the system, it determines the remaining PL times of the existing process and the new process and schedules the -U process that has the least time left. SS Response Time 𝑇𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 = 𝑇𝑓𝑖𝑟𝑠𝑡𝑅𝑢𝑛 − 𝑇𝑎𝑟𝑟𝑖𝑣𝑎𝑙 the time when a process arrives to the first time it runs AC Policy 4: Round Robin uses a new scheduling metric called Response Time also called the Time-Slicing Scheduling runs a process for a time slice or quantum (q) and then switch to the next process in the run queue ○ time slice should be a multiple of the timer-interrupt period (e.g. 10ms, 20ms quantum if timer interrupts every 10ms) fair but performs poorly on turnaround time Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester B PL length of time is critical ○ Shorter Time Slice better response time time cost of context switching will become more expensive than -U ○ Longer Time Slice worse response time less context switching cost SS Incorporating I/O when process requests for I/O, process state is set to blocked/waiting the scheduler should schedule another process or the CPU will be idle when the I/O completes: 1. an interrupt is generated and control is transferred to kernel AC 2. kernel changes state of process back to ready state 3. If circumstances allow, the scheduler could schedule this process in the CPU Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester B PL the first illustration at the top runs Process B after Process A finishes, using resources poorly the second illustration inserts Process B in between I/O requests of Process A -U to maximize resource utilization Summary SS The three scheduling metrics are: turnaround time, response time, and fairness. The turnaround time of a process is calculated by subtracting the time of arrival from the time of completion. To 𝑇𝑡𝑢𝑟𝑛𝑎𝑟𝑜𝑢𝑛𝑑 = 𝑇𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑖𝑜𝑛 − 𝑇𝑎𝑟𝑟𝑖𝑣𝑎𝑙 get the average turnaround time (ATT), AC divide the summation of the jobs’ turnaround time by the number of jobs. Subtract the time of arrival from the time of its first run to get the response time of a job. Consequently, the Average 𝑇𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 = 𝑇𝑓𝑖𝑟𝑠𝑡𝑅𝑢𝑛 − 𝑇𝑎𝑟𝑟𝑖𝑣𝑎𝑙 Response Time (ART) would be the total response time of all jobs divided by the number of jobs. Fairness often has a tradeoff with performance since a scheduler may optimize performance at the cost of preventing a few jobs from running, decreasing fairness. There are two approaches to scheduling policies: non-preemptive–runs a process to Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester completion before switching to another–and preemptive–does not require a process to run to completion before switching to another process. First In, First Out (FIFO) runs a process by arrival. Shortest Job First (SJF) is a non-preemptive approach that runs the next process with the shortest remaining time. Shortest Time-to-Completion First (STCF) is a preemptive approach that schedules the process that has the least remaining time out of all the processes. Round Robin (RR) or Time-slicing alternates through the processes every n milliseconds defined by the time quantum (q). B PL -U SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 02 Scheduling: The Multi-Level Feedback Queue Overview Multi-Level Feedback Queue addresses the problem of the scheduler not knowing how long a job would run (which is crucial in the previous scheduling policies) by observing the execution of the jobs. It uses multiple queues, each assigned a priority level, stores the jobs in their corresponding queues, and runs all according to the priority of the queue. Introduction B addresses two fundamental scheduling problems: 1. turnaround time optimization 2. minimize response time PL learns from the past to predict the future MLFQ: Basic Rules Rule 1 If Priority(A) > Priority(B), A runs and B doesn’t. -U Rule 2 If Priority(A) == Priority(B), A & B run in Round Robin. When a process enters the system, it is placed at the highest priority Rule 3 queue. If a process uses up the entire time slice while running, its priority is SS Rule 4a reduced (moves down on queue). If a process gives up the CPU before the time slice is up, it stays up at the Rule 4b same priority level. has a number of distinct queues assigned with different priority levels AC a process that is ready to run is assigned to a single queue 1. scheduler chooses a process on a higher priority queue to run on CPU 2. use Round-Robin on the same queue the rules above are modified along the way due to solutions for different issues found Examples Example 1: A Single Process a three-queue scheduler with 10ms time slice Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester B Example 2: Along Came a Short Process Process A is a long-running CPU-intensive process that has been running for a PL time Process B is a short-running interactive process that arrives at T = 100 -U SS Rule 3: Process is placed at the highest priority queue as it arrives. Rule 4a: Process moves down the queue every after time slice. Example 3: I/O AC Process A is a long-running CPU-intensive process. Process B is an interactive process that needs the CPU for 1ms before performing an I/O. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Rule 4b: Keep the process at the same queue if it relinquishes control before time slice ends. Problems with Basic MLFQ If there are too many interactive Starvation processes in the system, long-running processes will never receive CPU time. CPU-bound processes eventually A process may change its behavior become an I/O bound process while still B over time being at the bottom of the queue. After running 99% of a time slice, a PL process could issue an I/O and avoid Gaming the scheduler being moved down to a less priority queue, gaining a higher percentage of CPU time. -U Solution #1: Priority Boost After some time period S, move all processes in the system to the Rule 5 topmost queue (to solve starvation and address CPU-bound processes becoming interactive). SS solves starvation and CPU-bound process becoming interactive adds a new rule to MLFQ AC Solution #2: Better accounting of CPU time consumed by processes in each queue Once a process has consumed its time slice in a given queue–no matter Rule 4 how many times it gives up CPU control–as a whole, its priority is reduced. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester rewrites Rule 4a and 4b prevents a process from gaming the scheduler B PL Redefined Rules Rule 1 If Priority(A) > Priority(B), A runs and B doesn’t. Rule 2 If Priority(A) == Priority(B), A & B run in Round Robin. -U When a process enters the system, it is placed at the highest priority Rule 3 queue. Once a process has consumed its time slice in a given queue–no matter Rule 4 how many times it gives up CPU control–as a whole, its priority is reduced. SS After some time period S, move all processes in the system to the Rule 5 topmost queue (to solve starvation and address CPU-bound processes becoming interactive). AC Summary Multi-Level Feedback Queue (MLFQ) observes the execution of a job and prioritizes it accordingly. MLFQ delivers excellent overall performance for short-running interactive jobs and is fair for longer-running CPU-intensive workloads. There are three problems with the basic/initial rules of MLFQ: (1) starvation, where long-running CPU-intensive jobs could never run due to multiple presences of shorter jobs; (2) change in process behavior, where a CPU-bound process becoming interactive couldn’t run since it is still at the bottom queue; and (3) gaming the Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester scheduler, where a process could stay in the same queue indefinitely by running at 99% of the time slice. B PL -U SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 02 Scheduling: Proportional Share Introduction Proportional-share guarantees that each job obtains a certain percentage of CPU time instead of optimizing turnaround or response time. Sometimes referred to as fair-share scheduling B Basic Concept: Hold a Lottery! Tickets represent share of a resource that a process should receive ○ Example: There are two processes, A and B PL Process A has 75 tickets → receive 75% of CPU Process B has 25 tickets → receive 25% of CPU Approaches to Proportional-Share Scheduling -U Approach #1: Lottery Scheduling Steps: 1. Lottery is held every time slice 2. Scheduler picks a winning ticket randomly SS 3. Scheduler runs the process that holds that winning ticket Example: There are 100 tickets. ○ Process A has 75 tickets → 0 - 74 ○ Process B has 25 tickets → 75 - 99 AC Importance of Randomness 1. Avoids strange corner-case behaviors 2. Lightweight 3. Fast Ticket Mechanisms #1: Ticket Currency user (in a multiuser system) allocates tickets among their own process in Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester their own local currency system converts local currency into correct global currency value Example: There are 200 tickets (Global Currency) ○ Process A has 100 tickets. ○ Process B has 100 tickets. ○ With the 200 total global currency, a lottery is then held to B determine which job runs. #2: Ticket Transfer PL A process can temporarily hand off tickets to another process ○ useful in client-server apps with server doing tasks on behalf of clients After the server finishes, it transfers the tickets back to the client. #3: Ticket Inflation -U A process can temporarily raise or lower the number of tickets it owns if it needs more CPU time Assumes that a group of processes trust each other to prevent abuse Implementation SS To implement lottery scheduling, it needs the following: 1. A random number generator for picking the winning ticket. 2. A data structure (e.g. list) to track processes. 3. Total number of tickets. Example Specs AC 1. Pick a random number from the total number of tickets (ex. 300). 2. Traverse the list and initialize a variable counter. 3. With each process passed, add the corresponding ticket amount to the counter and check if it exceeds the random number. If it does, break from the loop. If not, go to next job. a. For Job A, add 100 to counter (counter == 100). Is counter greater than 300? No. Go to next job. b. For Job B, add 50 to counter (counter == 150). Counter greater than Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester 300? No. Go to next job. c. For Job C, add 250 to counter (400). Counter is now greater than 300, so break from loop. 4. Run the job that’s greater than the random number or the last job in the list. Assigning Tickets let users assign tickets Approach #2: Stride Scheduling a deterministic fair-share scheduler since randomness will not deliver good results for processes with short run times B Stride of each process 𝑠𝑜𝑚𝑒 𝑙𝑎𝑟𝑔𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 ○ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 ○ Example: Some large number say 10000 PL Process A has 100 tickets → stride of A is 10000 100 = 100 Process B has 50 tickets → stride of B is 10000 50 = 200 when a process runs, increment a counter (aka Pass value) for stride -U ○ pick process to run with lowest pass value Example Say Process A has stride = 100, Process B has stride = 200, and Process C has stride = 40. SS 1. New processes are pushed into the queue with their Pass set to 0. 2. Pop from the queue the process with the lowest Pass and schedule it. 3. After running, increment that process’ Pass by its stride and put it back to the queue. 4. Repeat. AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester B PL Approach #3: Linux Completely Fair Scheduler (CFS) Goal: fairly divide CPU evenly among all competing processes -U uses virtual runtime (vruntime) counting-based technique as a process runs, its vruntime increases (same increase rate as physical time) pick the lowest vruntime when scheduling a process’ uses a periodic timer interrupt that can only make decisions at fixed time intervals SS ○ not affected if time slice is not a multiple of interrupt timer interval time To address fairness and performance, it has two control parameters: 1. sched_latency determines how long one process should run before considering a switch (typically 48ms) AC 𝑠𝑐ℎ𝑒𝑑 𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑡𝑖𝑚𝑒 𝑠𝑙𝑖𝑐𝑒 = 𝑛 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 Example: 4 processes with typical 48ms sched_latency 48𝑚𝑠 𝑡𝑖𝑚𝑒 𝑠𝑙𝑖𝑐𝑒 = 4 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 = 12𝑚𝑠/𝑝𝑟𝑜𝑐𝑒𝑠𝑠 2. min_granularity address too many processes running ○ when n is very large, time slice becomes small minimum time slice despite a large n typically set to 6ms Weighting (Niceness) allows user/admin to control process priority Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 02 Multiprocessor Scheduling Introduction Multiprocessor systems are becoming common due to multicore processors. ○ Multicore processors - have multiple CPU cores on a single chip. ○ Adding more CPUs doesn't make single applications run faster; they need to be rewritten to run in parallel. How to schedule jobs on multiple CPUs in the operating system? B Background: Multiprocessor Architecture and Problems To understand multiprocessor scheduling, it is necessary to know about PL hardware caches and how data is shared in multi-CPU systems. ○ Caches store frequently accessed (or popular) data, making it faster to retrieve than from main memory. Two types of locality: temporal (data accessed frequently) and spatial (data near a location is accessed). -U Problem # 1: Cache Coherence ○ Caching becomes complicated in multi-CPU systems sharing a single memory due to cache coherence issues. ○ Cache coherence ensures all CPUs see the same shared memory view; ○ Solution: Bus Snooping SS Hardware monitors memory access and updates cache accordingly, either invalidating or updating cached data. Problem # 2: Synchronization ○ Despite hardware caches providing coherence, programs and the OS must still consider synchronization when accessing shared data. AC ○ Solution: Mutual exclusion primitives like locks are often necessary to ensure correctness when accessing shared data across CPUs. Locks are essential to atomically update shared data structures, preventing issues like data corruption. Example: Code for removing elements from a shared linked list can lead to problems without locks due to concurrent access by multiple threads. Locking with mutexes is a solution but can impact performance as the number of CPUs increases. Problem # 3: Cache Affinity ○ Cache affinity means that a process running on a specific CPU benefits from having its state in that CPU's caches. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester ○ Running a process on the same CPU where it previously ran can improve performance because some of its state is already cached. ○ A multiprocessor scheduler should consider cache affinity when making scheduling decisions, favoring keeping a process on the same CPU when possible. Approach 1: Single-Queue Scheduling Single-queue multiprocessor scheduling (SQMS) is a basic approach where all jobs are placed in a single queue. ○ It's simple to implement but: has scalability issues due to locks, which can decrease B performance as the number of CPUs increases. lacks cache affinity, causing jobs to bounce between CPUs, which is inefficient for cache performance. PL To address cache affinity problems, some SQMS schedulers include affinity mechanisms to keep some jobs on the same CPU. -U SS Multi-Queue Scheduling AC Multi-queue multiprocessor scheduling (MQMS) uses multiple scheduling queues, one per CPU, to address problems seen in single-queue scheduling. ○ Each job is placed into one of these queues, and each queue operates independently with its own scheduling policy. ○ MQMS is more scalable than single-queue scheduling, as lock and cache contention is reduced. ○ It inherently provides cache affinity as jobs stay on the same CPU. However, it can suffer from load imbalance, where some CPUs have more work than others. ○ To address load imbalance, migration of jobs between CPUs is needed. Techniques like work stealing are used to decide when to migrate jobs. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester ○ Finding the right balance in migration frequency is challenging and often requires trial and error. Linux Multiprocessor Schedulers Three multiprocessor schedulers have been developed over time: O(1) Completely Fair BF Scheduler (BFS) Scheduler (CFS) Multiple Queues Multiple Queues Single Queue B Priority-based Scheduler Deterministic Proportional-share Proportional-share Scheduler Scheduler PL Based on Earliest Eligible Virtual Deadline First Scheme -U Summary Multiprocessor systems are common due to multicore processors with multiple CPU cores on a chip. Challenges in multiprocessor scheduling include cache coherence, synchronization, SS and cache affinity. Single-queue multiprocessor scheduling (SQMS) is a basic approach where all jobs are placed in a single queue. Simple but lacks scalability and cache affinity. Multi-queue multiprocessor scheduling (MQMS) uses multiple scheduling queues, AC one per CPU, to address problems seen in single-queue scheduling. Linux has multiple multiprocessor schedulers like O(1), Completely Fair Scheduler (CFS), and BF Scheduler (BFS). Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 03 Abstraction: Address Spaces Memory Virtualization A technique where the operating system (OS) creates an illusion of a dedicated memory space for each process, making it appear as if each process has access to the entire memory. ○ This provides benefits like simplified programming, efficient memory utilization, process isolation, and protection against unauthorized B memory access. Early Systems PL Early computers had limited memory abstractions. The OS was just a library, that loads only one process in memory, and only executes until completion. ○ Problem: Poor CPU utilization and memory efficiency CPU is idle when process is doing I/O -U Other processes can fit in memory Multiprogramming and Time Sharing Multiprogramming - loading multiple processes in memory SS Time Sharing - introduces interactivity, addressing limitations of batch computing ○ Execute one for a short while, then switch between processes in memory Result: Increases CPU utilization and memory efficiency Protection Issue Arises: Brought by interactivity, a process shouldn’t be able AC to read or write some other process’s memory. To solve this… The Address Space The OS creates an abstraction called the "address space" for each running program. ○ The address space contains the program's code, heap, and stack. Code - where instructions live Heap - dynamically allocated memory (malloc) Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Stack - return addresses, local variables ○ These components may grow and shrink during program execution. Heaps grow downward, while stacks grow upward OS creates an abstraction of physical memory ○ Virtual Address - every address in a running program is virtual! ○ The OS loads programs at arbitrary physical addresses, but the program believes it's at a specific virtual address. Goals Transparency - the program doesn’t realize memory is virtualized, process behaves as if it has its own private physical memory B Efficiency - making virtualization fast and not using excessive memory, will need to reply on hardware support (TLBs, etc.) Protection and Isolation - ensures processes can’t access or affect each PL other’s memory or the Kernel / OS Summary Memory virtualization creates the illusion of dedicated memory for each process, -U offering benefits like simplified programming, efficient memory use, process isolation, and security. Multiprogramming and time sharing improved CPU and memory efficiency by loading multiple processes and switching between them. SS Address space represents a program's memory abstraction, consisting of code, heap, and stack, which can grow or shrink during execution. Memory virtualization involves the OS mapping virtual addresses to physical ones, aiming for transparency, efficiency, and protection. AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 03 Interlude: Memory API Types of Memory Two types of memory allocation: stack memory and heap memory Stack Memory Heap Memory managed implicitly by the compiler, requires explicit management by the hence called automatic memory. programmer. B easy to declare in functions; the compiler memory is allocated using functions like handles allocation and deallocation. malloc() PL short-lived allocations more versatile but requires careful handling grows upward in address space grows downward in address space -U The malloc() Call Allocates memory on the heap The parameter passed to malloc() is of type size_t, specifying the number of SS bytes needed. Returns a pointer to the allocated space or NULL on failure sizeof() operator ○ to determine the size required for a specific type. When allocating space for strings, use malloc(strlen(s) + 1) to account for the end-of-string character. AC The free() Call Free a memory region allocated by a call to malloc. free() does not require the size of the allocated memory to be specified by the user; it is tracked internally by the memory allocation library. Common Errors Forgetting to Allocate Memory - leading to undefined behavior. Not Allocating Enough Memory - may potentially cause buffer overflow Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Forgetting to Initialize Memory - resulting in uninitialized reads Memory Leak - forgetting to free until a process runs out of memory and eventually dies Dangling Pointer - freeing memory before you are done with it Double Free - freeing memory that was already freed Invalid Frees - calling free() incorrectly Underlying OS Support Tools for Checking Memory Errors: Purify, Valgrind malloc() and free() are library calls, not system calls. B These library calls manage memory within a program's virtual address space. They rely on underlying system calls that interact with the operating system to allocate or deallocate memory. PL ○ One such system call is brk/sbrk, which adjusts the program's heap boundary. It's crucial not to directly call brk or sbrk as they are meant to be used by the memory-allocation library. ○ mmap() call - obtain memory by creating anonymous memory regions -U within a program's address space. These regions are not associated with specific files but are managed like a heap. Other Calls SS calloc() - Allocate memory on the heap and zeroes it before returning ○ This helps prevent errors related to uninitialized memory. realloc() - Change the size of memory block. ○ It creates a new larger memory region, copies the old data into it, and returns a pointer to the new region. AC Summary Two types of memory: Stack memory is compiler-managed, short-lived, and grows upward. Heap memory requires manual management, grows downward, and uses malloc(). malloc() allocates heap memory, returns a pointer, and takes a size parameter. free() deallocates heap memory, and size tracking is handled internally. Common memory errors include misallocation, wrong initialization, leaks, Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester dangling pointers, double freeing, and invalid frees. malloc() and free() are library calls relying on system calls (e.g., brk/sbrk, mmap) for memory management. B PL -U SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 03 Mechanism: Address Translation Introduction Looking Back: Virtualization of CPU focused on limited direct execution (LDE) for efficiency and control. ○ LDE lets programs run directly on hardware but involves the OS at key points (e.g., system calls) to maintain control. In virtualizing memory, there is a similar strategy for efficiency, control, and flexibility. ○ Attained by hardware support (registers, TLBs, page-table support) B used for memory virtualization. ○ Control ensures applications can't access memory outside their own. ○ Need flexibility in address space usage for easier programming. PL Address Translation ○ Mechanism: virtual address -> physical address ○ The OS manages memory, tracks free and in-use locations, and intervenes as needed. Assumptions: -U ○ User's address space is contiguous in physical memory. ○ Address space size is smaller than physical memory. ○ All address spaces are of the same size. ○ Note: Assumptions will be further relaxed to achieve a realistic memory virtualization. SS Example of Address Translation Suppose that a C program loads a value from memory, increments it by three, and stores it back as seen below: AC The process's address space starts at address 0 and grows to a maximum of 16 KB. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester B PL The OS wants to place the process in physical memory at a different address, -Ucreating a relocation problem. SS AC The OS uses the first slot of physical memory, relocates the process to start at physical memory address 32 KB, and keeps other slots free. Dynamic (Hardware-based) Relocation Requires two CPU registers: base and bounds (limit) registers. ○ Base and bounds allow placing a process's address space anywhere in physical memory while ensuring process isolation. When a process runs, memory references are translated by adding the base register value to the virtual address. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester ○ Sample Computation in Example: movl 0x0(%ebx), %eax (at 128 B) PC set to 128, added to base register (32 KB) gives a physical address of 32896. Note: 1 KB = 1024 B (32 X 1024) + 128 = 32896 Load from virtual address 15 KB is added to the base register 32 KB, resulting in a final physical address of 47 KB. B The bounds (limit) register ensures memory references are within legal bounds for protection. PL If a virtual address is out of bounds, the CPU raises an exception, likely terminating the process. -U Example translations for a process with a 4 KB address space loaded at physical address 16 KB: ○ If Virtual Address = 4400 → Fault (out of bounds) SS Hardware Support for Dynamic Relocation (Base and Bounds) Two CPU modes: Kernel mode (for OS) and User Mode (for applications). ○ CPU mode indicated by a bit called Processor Status Word, switching on special occasions (e.g., system calls or exceptions). Base and bounds registers: Each CPU has an additional pair of registers in the AC memory management unit (MMU) for address translation and bounds checking. ○ Special privileged instructions: OS can modify base and bounds registers, but only in Kernel mode Ability to raise exceptions: CPU generates exceptions when user programs try to access privileged instructions or out-of-bounds memory. ○ Exception handling: CPU runs the appropriate OS exception handler, allowing the OS to react (e.g., process termination). OS Issues needed to be Addressed for Dynamic Relocation Implement Base and bounds Approach: Set base and bounds registers upon context switch between processes. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester ○ When a process: Starts running - find space for address space in physical memory, maintain a free list ( Free list - a list of range of physical memory which are not in use Is Terminated - Reclaiming the memory for use by other processes Context Switch occurs - save and store base-and-bounds pair Stored in process structure or process control block (PCB) Is Blocked - move to a different location Limited Direct Execution with Dynamic Relocation (at Boot Time) B PL -U Limited Direct Execution with Dynamic Relocation (at Run Time) SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester B PL -U SS AC Internal Fragmentation - not all allocated space is used Summary Address translation is the mechanism to map virtual addresses to physical addresses. Hardware support (registers, TLBs, page tables) is used for memory virtualization. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Assumptions include contiguous address space, smaller size than physical memory, and uniform size for all address spaces. Dynamic relocation with base and bounds registers allows flexibility in placing address spaces in physical memory. Hardware support includes CPU modes (Kernel and User), privileged instructions, and exception handling. B PL -U SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 04 Segmentation Introduction Current memory allocation method uses base and bounds registers to relocate processes in physical memory. Problem: There's a significant "free" space between the stack and heap in address spaces due to internal fragmentation. ○ This free space, although unused, occupies physical memory when relocating address spaces. How to support a large address space with potential wasted space? B ○ Ex: In larger 32-bit address spaces, even if a program uses only megabytes of memory, the entire address space must be in memory. PL Segmentation: Generalized Base / Bounds Segmentation is an approach to memory management. ○ It involves having multiple base and bounds pairs for different segments of the address space. -U Segments are logical and contiguous portions of the address space, such as code, stack, and heap being different segments. ○ Segmentation allows placing each segment in different parts of physical memory to avoid wasting space. SS AC Only used memory is allocated space in physical memory, accommodating large address spaces with unused regions. The hardware structure includes a set of base and bounds register pairs, one pair per segment. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Base registers store the starting physical address of each segment, and bounds registers specify the size of each segment. When a virtual address is referenced, the hardware calculates the physical address by adding the base to the offset and checks for bounds violations. B PL If an illegal address is accessed, it results in a segmentation fault or violation, leading to termination of the offending process. Which Segment are we Referring to? -U To determine the segment and offset of a virtual address, the hardware uses segment registers during translation. Explicit approach - involves dividing the address space into segments based on the top few bits of the virtual address. SS AC ○ In the example with three segments (code, heap, stack), two bits are needed for segment selection. The top two bits of a 14-bit virtual address are used to select the segment, leaving the bottom 12 bits as the offset into the segment. For instance, if the top two bits are "01," the hardware knows it's referring to the heap segment. To obtain the physical address, the hardware adds the offset to the base register of the selected segment. ○ The offset simplifies bounds checking; if the offset exceeds the bounds, it raises a protection fault. ○ Disadvantages: Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Using the top bits to select segments can lead to unused segments in the address space This approach limits the maximum size of each segment, potentially causing issues when a segment needs to grow beyond its limit. Implicit Approach - involves determining the segment based on how the virtual address was generated; ○ For instance, an address from the program counter is in the code segment, while an address from the stack or base pointer is in the stack segment. What about the Stack? B Unlike other segments, the stack grows backward, towards lower addresses. Hardware support is required to understand the direction of segment growth, PL whether it's positive (forward) or negative (backward). ○ Segment registers in hardware indicate the base, size, and growth direction of each segment, including the stack. When translating virtual addresses in segments that grow negatively (like the stack), hardware adjusts the offset calculation. -U SS ○ Physical Address = Negative Offset + Stack Base ○ For example, to translate a virtual address like 15KB to physical address AC 27KB. Also, suppose that the maximum segment size is 4KB. 15 KB = 11 1100 0000 0000 Segment = 11 (binary) = 3 (decimal) Offset = 1100 0000 0000 (binary) = 3KB The hardware considers the negative offset (3KB minus 4KB = 1KB) and adds it to the base address (28KB) to obtain the correct physical address (27KB). Bounds checking still ensures that the absolute value of the negative offset is within the segment's current size (e.g., 2KB). Support for Sharing Segmentation supports sharing memory segments between address spaces. To enable sharing, hardware support with protection bits is needed. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester ○ Protection bits indicate whether a program can read, write, or execute a segment. Fine-grained vs. Coarse-grained Segmentation Segmentation can be categorized as coarse-grained or fine-grained. Coarse-grained segmentation divides the address space into a few large segments (e.g., code, stack, heap). Fine-grained segmentation allows for many smaller segments within the address space. ○ Fine-grained segmentation requires additional hardware support, B typically involving a segment table stored in memory. Segment tables enable the creation of a large number of segments for more flexible memory management. PL OS Support for Segmentation: Managing Fragmentation Segmentation saves physical memory by relocating pieces of the address space as the system runs. -U ○ It allows more address spaces to fit in physical memory and supports large, sparse virtual address spaces per process. OS challenges with segmentation include: ○ Saving and restoring segment registers during context switches. ○ When segments grow (e.g., heap expansion), the OS must allocate more SS physical memory and update segment size registers. ○ Managing free space in physical memory becomes complex with different segment sizes, leading to external fragmentation. External fragmentation - small holes of free space in physical memory, making it challenging to allocate new segments or grow existing ones. AC Solutions to External Fragmentation: ○ Compaction - rearranging existing segments to create contiguous free space; is a solution but is expensive and can disrupt running processes. ○ Free-list management algorithms - attempt to minimize external fragmentation by maintaining large extents of free memory. Various algorithms like best-fit, worst-fit, first-fit, and buddy algorithms are used, but none can completely eliminate external fragmentation. Summary Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Segmentation is a memory management approach to address the issue of wasted space in address spaces due to internal fragmentation. It divides the address space into segments, each with a base and bounds pair, enabling efficient use of physical memory. Hardware uses segment registers to determine the segment and offset of a virtual address, enforcing bounds checking. It supports sharing of memory segments between address spaces with protection bits and can be coarse-grained or fine-grained. OS challenges with segmentation include context switches, managing growing B segments, and dealing with external fragmentation. PL -U SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 04 Free-Space Management Introduction Free space management -> avoids fragmentation Easy to achieve if free space are in fixed-sizes (e.g. using Paging) Hard when free space are in variable sizes (e.g. using Segmentation) External fragmentation is when free space is fragmented into different-sized pieces, making it hard to fulfill requests even if the total free space is sufficient. B The figure illustrates this problem, where 20 bytes of free space are split into PL two 10-byte chunks, causing a 15-byte request to fail in this configuration. Assumptions -U Primarily focuses on memory allocators in user-level memory-allocation libraries. They assume a basic interface with functions like malloc() and free(). ○ malloc() takes a size parameter and returns a pointer to a memory region of that size. SS ○ free() takes a pointer and releases the corresponding memory chunk. The main concern is external fragmentation, where free memory is split into smaller pieces. Once memory is allocated to a client, it cannot be moved or compacted until it's returned with free(). The allocator manages a contiguous region of bytes, and its size is fixed AC throughout. Low-level Mechanisms Splitting - involves dividing a free chunk into two when a small request is made. Coalescing - merges adjacent free chunks to prevent fragmentation. Tracking the Size of Allocated Regions - ○ To track the size of allocated regions, a header is used, which contains information like size and extra information in a header block ○ The header is placed just before the allocated memory chunk. Embedding a Free List Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester ○ Building a free list inside the managed space involves initializing it and updating it as allocations and deallocations occur. ○ When the heap runs out of space, an allocator can request more memory from the OS, typically using a system call like sbrk. ○ This allows the heap to grow and accommodate more allocations. Basic Strategies Best Fit - searches for the smallest free chunk that can accommodate the request, aiming to minimize wasted space. However, it can be slow due to the exhaustive search for the best fit. Worst Fit - tries to leave larger chunks free, but it tends to lead to excess B fragmentation and has high overhead. First Fit - finds the first free chunk that can satisfy the request, offering faster performance but potentially leading to small objects at the beginning of the PL free list. Next Fit - is similar to First Fit but starts the search where it left off previously, spreading the searches more uniformly across the list. These strategies have their pros and cons, and their effectiveness depends on the specific workload and allocator behavior. -U Other Approaches Segregated Lists ○ Keep dedicated lists for popular request sizes, others are served by the general memory allocator Examples: kernel data structures such as locks, inodes, etc. SS ○ New complication arises How much memory should dedicate to the pool of memory that serves specialized requests of a given size? ○ Slab allocator handles this issue Allocate a number of object caches AC The objects are likely to be requested frequently (e.g. locks, file-system inodes, etc.) Request some memory from a more general memory allocator when a given cache is running low on free space Buddy Allocation ○ Optimized for coalescing ○ Divides free memory into power-of-two-sized blocks. When a memory request is made, the allocator recursively splits blocks until a block of the desired size is found. When a block is freed, the allocator checks if its buddy block is also free and coalesces them. This approach simplifies coalescing but can suffer from internal fragmentation. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Other approaches use more advanced data structures to address scaling and multiprocessor systems Summary Segmentation struggles with variable-sized free spaces, leading to external fragmentation. External fragmentation is when free space is split into different-sized pieces, hindering allocations. Mechanisms include splitting, coalescing, tracking allocation size with headers, and embedding a free list B Basic strategies: Best Fit (minimize waste), Worst Fit (leave larger chunks free), First Fit (faster), Next Fit (continuous search). PL Segregated Lists: Dedicated lists for popular request sizes; slab allocator handles specialized requests. Buddy Allocation: Divides memory into power-of-two-sized blocks, coalesces buddy blocks. -U SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 04 Paging: Introduction Introduction Operating systems use two approaches for solving space-management problems: variable-sized pieces and fixed-sized pieces. ○ Segmentation (discussed earlier)- using variable-sized pieces, can lead to fragmentation and make allocation challenging. ○ Paging - using fixed-sized pieces, where… B Virtual address is divided into (virtual) pages Physical memory is divided into (physical) page frames Each page frame consists of one virtual page PL How to virtualize memory with pages, avoiding segmentation issues? What are the techniques to optimize them for minimal space and time overheads? Advantages of Paging ○ Flexibility to support various address space uses without assumptions. ○ Simplicity in managing free space. -U Important Terms Paging - A memory management technique that divides an address space into SS fixed-sized pages. Page - portion of a divided address space Page Frames - Physical memory slots, each capable of holding one virtual memory page. Page Table - A data structure per process that maps virtual pages to physical AC page frames for address translation. ○ Page Table Entry (PTE) - Each entry in the page table Address translation involves splitting the virtual address into VPN (virtual page number) and offset. ○ Virtual Page Number (VPN) - The high-order bits of a virtual address, used to index the page table, often large in size (e.g. 20 bits) ○ Offset - The low-order bits of a virtual address, indicating the byte's position within a page. ○ Physical / Page Frame Number (PFN) - mapped by page table from VFN Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester B PL Page Table: Challenges and Storage Page tables are typically stored in memory ○ Note: One page table per process Challenge: Page tables can become large for address spaces with many pages. -U ○ For example, a typical 32-bit address space with 4KB pages has a 20-bit VPN and 12-bit offset. ○ Each process would need around 4MB of memory for its page table (assuming 4 SS bytes per page table entry). ○ With 100 processes, the OS would need 400MB of memory just for address translations. AC What’s in a Page Table? Page Table as Data Structure: Virtual Address (VPN) -> Physical Address (PFN) ○ Linear Page Table - A simple page table organization represented as an array, indexed by the virtual page number (VPN). ○ Indexes the array (page table) by VPN and looks up the page table entry (PTE) Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Contents of PTE ○ Page frame number (PFN) points to the physical frame associated with the virtual page. ○ Key bits in a PTE: Valid bit (V): Indicates if the translation is valid. Protection bits (R/W, U/S): Define read/write and B user/supervisor permissions. Present bit (P): Shows if the page is in physical memory or on disk (swapped out). PL Dirty bit (D): Indicates if the page has been modified since being loaded. Referenced / Accessed bit (A): Tracks whether the page has been accessed. This helps track page popularity for page replacement. -U Other bits (e.g., PWT, PCD, PAT, G) determine caching and other hardware-related aspects. Paging: Also Too Slow SS Page tables can also slow down the system. Address translation involves fetching the page table entry (PTE) for a virtual address. The hardware needs to know the location of the page table for the currently-running process. AC ○ Page Table Base Register (PTBR) - stores address of start of page table This extra memory reference for translation slows down memory access. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Memory Trace B PL -U SS AC In the provided memory trace for the code snippet, we can observe the following patterns: ○ For each instruction fetch, there are two memory accesses: one to the page table to find the physical frame of the instruction and one to fetch the instruction itself. ○ The explicit memory reference (mov instruction) also generates two memory accesses: one to the page table for address translation and one to fetch the data from the array. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester ○ Page table memory accesses occur to translate virtual addresses to physical addresses. In this example, the page table resides in physical memory. ○ Memory references repeat in a regular pattern as the loop iterates. Each iteration includes four instruction fetches, one explicit update of memory, and five page table accesses for translation. As the loop continues to run beyond these first five iterations, the patterns will repeat. The instruction fetches, explicit updates, and page table accesses will continue to occur in the same manner for subsequent iterations. B Summary PL -U SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester CMSC 125 Week 04 Paging: Faster Translations (TLBs) (insert overview) Introduction Paging requires a lot of mapping information, stored in physical memory. ○ This leads to an extra memory lookup for each virtual address, causing performance overhead. ○ The challenge is to speed up address translation and reduce this B overhead. To do this, hardware support is needed, and the solution is a Translation Lookaside Buffer (TLB). PL ○ TLB is a hardware cache that stores popular virtual-to-physical address translations. ○ When a virtual memory reference occurs, the hardware checks the TLB first. ○ If the translation is found in the TLB (TLB hit), it's used quickly without -U consulting the page table. TLBs play a crucial role in making virtual memory efficient. SS AC Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester TLB Basic Algorithm B PL Analysis: ○ The algorithm starts by extracting the Virtual Page Number (VPN) from the virtual address (Line 1). ○ It checks if the TLB contains the translation for this VPN (Line 2), indicating a TLB hit. -U ○ On a TLB hit, it extracts the Page Frame Number (PFN), combines it with the offset, and forms the physical address (PA) for memory access (Lines 5-7). ○ This is done as long as protection checks pass (Line 4). ○ In case of a TLB miss, the hardware accesses the page table to find the SS translation (Lines 11-12). ○ Assuming the virtual memory reference is valid and accessible (Lines 13, 15), the TLB is updated with the translation (Line 18). ○ These actions are costly, mainly due to the extra memory reference needed to access the page table (Line 12). AC ○ After updating the TLB, the hardware retries the instruction, and this time, the translation is found in the TLB for quick memory access. The TLB is like a cache, and its efficiency relies on translations being found in it (TLB hits). TLB hits are fast, but TLB misses incur high costs as they involve accessing the page table and extra memory references. Frequent TLB misses can slow down program execution because memory accesses are relatively expensive. The goal is to minimize TLB misses to improve performance. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester Example: Accessing an Array B PL In this example, there's a virtual address trace for accessing an array of 10 4-byte integers. ○ The array starts at virtual address 100, and the virtual address space has 16-byte pages with a 4-bit VPN and a 4-bit offset. -U The array is distributed across 16 pages. ○ A loop in C accesses each array element. The TLB is used for virtual-to-physical address translation. ○ Initially, a TLB miss occurs when accessing the first element (a). SS ○ Subsequent accesses (a, a) are TLB hits because they share the same page. ○ A TLB miss happens again when accessing a, but the following accesses (a... a) are TLB hits due to spatial locality. The TLB hit rate for this sequence is 70%, which improves performance, especially considering it's the first access to the array. AC The page size affects TLB performance, with larger pages resulting in fewer TLB misses. TLB performance can further improve with temporal locality, where re-accessing the array shortly after the loop can lead to a high TLB hit rate. Caching is a fundamental technique in computer systems that leverages locality (temporal and spatial) to speed up memory access. ○ Hardware caches store copies of memory in fast on-chip memory to reduce the need to access slower main memory. ○ The size of caches is limited by physical constraints, making them small and fast. The key is to use caches effectively to improve performance, as making them larger can lead to diminishing returns. Alliance of Computer Science Students – UPLB A.Y. 2023-2024, 1st Semester TLBs, like other caches, rely on spatial and temporal locality for success, making them efficient for programs exhibiting such behavior. Who Handles the TLB Miss? The question of who handles a TLB miss has two possible answers: hardware or software (OS). ○ Hardware handles the TLB miss entirely on complex instruction sets (CISC) ○ More modern architectures, like MIPS R10k or Sun's SPARC v9 (RISC), use a software-managed TLB approach. B On a TLB miss, the hardware raises an exception, switching to kernel mode and jumping to a trap handler within the OS. The trap handler in the OS looks up the translation in the page PL table, updates the TLB using privileged instructions, and returns from the trap. The hardware then retries the instruction, resulting in a TLB hit.. -U TLB Contents: SS A typical TLB (Translation Lookaside Buffer) contains 32, 64, or 128 entries. TLBs are fully associative, meaning translations can be anywhere in the TLB, and the hardware searches all entries in parallel to find the desired translation. Each TLB entry typically has the following components: AC ○ VPN (Virtual Page Number): This represents the virtual page number of the translation. ○ PFN (Page Frame Number): This represents the physical page frame number where the virtual page is stored. ○ Other bits: These are additional fields in the TLB entry. Valid bit: Indicates whether the entry contains a valid translation. Protection bits: Determine how a page can be accessed, such as read, write, or execute permissions. Address-space identifier: Helps identify which address space the translation belongs to. Dirty bit: Indicates whether the page has been modified since it was loade

CMSC 125 LE1 ACSS Module V1.1.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue