full-lecture-205-272.pdf
Document Details
Uploaded by RockStarCherryTree
Tags
Full Transcript
211 Chapter 5: Input-Output Devices and Interrupts Operating Systems and System So ware 212 Overview In this c...
211 Chapter 5: Input-Output Devices and Interrupts Operating Systems and System So ware 212 Overview In this chapter, we will take about input-output devices and interrupts! 1. I/O devices 2. Interrupts 3. I/O so ware 4. Storage devices 5. Network devices 6. User interface devices Operating Systems and System So ware 213 I/O Devices Operating Systems and System So ware 214 What Are I/O Devices Computer systems would be less useful without the capability to communicate outside the CPU-memory world. Input-output devices (I/O devices), are physical devices that enable interactions with the “outside” world From an electrical engineer’s perspective, they are a set of chips and mechanical parts that perform operations ⟹ They focus on how devices work From a programmer’s perspective, they are a set of interface to use to interact with them ⟹ They focus on how to communicate with devices Usually, I/O devices can be classified as: Block devices that work with fixed-size blocks of information that are addressable. Operations on these devices are performed at the granularity of a block, and blocks can be accessed independently of each other. Block devices usually store information that can be accessed later on. Examples: hard drives (HDD), solid-state drives (SSD), magnetic tapes Character devices that work with streams of characters. They are not addressable and usually do not keep any information to be accessed later on. Examples: printers, keyboards/mice, terminals Operating Systems and System So ware 215 Device Controllers Devices usually have a hardware controller that is used to interface with the rest of the computer. This controller is usually located on the motherboard, or may be added with a PCIe card. Controllers usually expose a set of control registers to the OS to check their status and trigger operations. Table: Bandwidths of various devices Device/Specification Bandwidth Keyboard 10 B/s Mouse 100 B/s Bluetooth 5 BLE 256 KB/s 802.11n Wireless 37.5 MB/s USB 2.0 60 MB/s Gigabit Ethernet 125 MB/s SATA 3 HDD 600 MB/s USB 3.0 625 MB/s PCIe 3.0 (single lane) 985 MB/s Controllers usually implement an interface specification (USB, SATA, etc.) so that connected 802.11ax Wireless 1.25 GB/s devices can be controlled in a standard way. These specifications detail the communication protocols between the devices and the rest PCIe 5.0 nVME SSD (read) 14 GB/s of the system, with varying performance, cost and complexity. USB 4.0 5 GB/s PCIe 6.0 126 GB/s Controllers may also implement various features such as error checking/correction, replication, compression, encryption, wear leveling, virtual address translation, etc. Operating Systems and System So ware 216 Communication With Devices There are two main communication mechanisms for device controllers Device ports Memory-mapped devices Some architectures, e.g., x86, have the concept of I/O ports, where each Most architectures (also) map device control registers into the device control register is assigned a port number. physical address space, protected as supervisor only. The OS can Programs can then read from/write to these ports with special then use regular memory operations to interact with devices. instructions, available only in supervisor mode: in reg, port ; read port into reg out port, reg ; write reg into port I/O port numbers can be seen as addresses to a device address space instead of the regular memory address space. in eax, 12 ; copy port 12 into the eax register mov eax, 12 ; copy the content of memory address 12 into eax Another advantage is that all memory instructions can be used directly on these control registers, instead of having to read the port into a register to use them, e.g., testing a memory location usually has its own instruction, while testing a port would require to read it first and then test it. Operating Systems and System So ware 217 A CPU-Centric Model In both communication models, the CPU has to trigger read and write operations to devices, making it the center of communication. Imagine that you are reading a file from a hard drive to load it into memory. For each block, the OS needs to do the following: 1. Request the device controller to read the block from the hard drive into its internal memory buffer 2. The controller signals to the CPU that the block has been read 3. The CPU copies the controller’s internal buffer to main memory. For each byte (or word): a. Copy the value from the controller to a CPU register b. Copy the register into main memory c. Repeat byte by byte (or word by word) until the whole block is in main memory We are wasting precious CPU cycles on an operation where it doesn’t add any value! Thankfully, computer designers came up with a solution: Direct Memory Access controllers. Operating Systems and System So ware 218 Direct Memory Access (DMA) When a device needs to read from/write to memory, they don’t need the CPU to do it for them, they can use Direct Memory Access (DMA). The system needs to have a DMA controller connected to the bus. The CPU can then configure it to perform the copies between devices and memory by itself. While the DMA controller performs the copies, the CPU is free to do something else. The DMA process is as follows: 1. The CPU configures the DMA controller, providing the source device, the destination 1 2 address in main memory, and the size of the data to be copied. 3 4 The CPU is now free to do something else 2. The DMA controller requests the transfer from the device or main memory (depending on the direction of the transfer) 3. When the transfer is done, the device/main memory sends an ACK (acknowledgment) signal to the DMA controller to inform it that the transfer is done 4. The DMA controller signals to the CPU that the DMA transfer is complete Note that the CPU still needs to issue commands to device controllers to copy data to/from their internal buffers. Let’s see how hardware devices/controllers can send signals to the CPU! Operating Systems and System So ware 219 Interrupts Operating Systems and System So ware 220 The Interrupt Mechanism Definition An interrupt is a high priority signal sent to the CPU to inform it that it needs to handle an urgent event on the system. Due to the high priority of the event, treatment of this signal interrupts the currently executing program on the CPU to handle the interrupt. The overall mechanism works as follows: 1. When a device wants to signal an event to the CPU, e.g., an I/O is finished, it sends an interrupt request (IRQ) to the interrupt controller. 2. When the interrupt controller receives an IRQ, it sends a signal to the CPU with an identifier specifying the reason for the interrupt. 3. The signal interrupts the CPU, that automatically switches to supervisor mode and jumps to its interrupt handling code. This code checks the identifier of the IRQ, and jumps to the proper interrupt service routine through the interrupt vector. 4. The interrupt service routine (ISR) processes the interrupt. 5. When the interrupt has been processed, the CPU acknowledges that it has treated the IRQ to the interrupt controller and jumps back to the code it was executing before. Device Interrupt Device CPU controller Device Operating Systems and System So ware 221 Classes of Interrupts The word interrupt is used for various types of events treated as described before, interrupting the currently executing code to perform another task. They can usually be classified as: Hardware interrupts: Sent by hardware devices through the interrupt controller, as shown previously. Traps or so ware interrupts: Deliberately triggered by program code, for example to switch to supervisor mode and perform system calls. Faults or exceptions: Not deliberately triggered by program code, usually due to an illegal operation, e.g., a division by zero or segmentation faults. Operating Systems and System So ware 222 Interrupt Controller The interrupt controller is an integrated circuit that redirects interrupt requests (IRQs) from devices to the CPU. Each input line is connected to a specific device controller, and the output line is connected to the CPU. When a device sends an IRQ to the interrupt controller, it is forwarded to the CPU with its number. For example, the Intel 8259 PIC was set up with two controllers: IRQ8 Device IRQ15 IRQ0 IRQ 0: system timer IRQ 8: RTC timer IRQ9 Device IRQ15 IRQ1 IRQ0 IRQ 1: keyboard controller IRQ 9: ACPI Device IRQ0 IRQ15 IRQ2 IRQ1 IRQ11 Device IRQ1 IRQ15 IRQ 2: cascade from slave IRQ 10: open/SCSI/NIC Device IRQ3 OUT IRQ2 IRQ2 IRQ15 IRQ4 IRQ3 IRQ 3: serial port COM2 IRQ 11: open/SCSI/NIC IRQ13 Device IRQ3 Device IRQ15 IRQ5 OUT Interrupt CPU IRQ14 IRQ4 IRQ15 IRQ 4: serial port COM1 IRQ 12: mouse controller Device IRQ6 IRQ5 IRQ15 Device IRQ5 IRQ15 IRQ 5: parallel port 2/3/sound card IRQ 13: math co-processor Device IRQ7 IRQ6 Device IRQ6 IRQ7 IRQ 6: floppy controller IRQ 14: ATA channel 1 Device IRQ7 IRQ 7: parallel port 1 IRQ 15: ATA channel 2 If multiple IRQs are pending simultaneously, the controller decides which one to prioritize before informing the CPU. If the CPU is already handling a higher priority interrupt, it is withheld for later treatment. Note Modern interrupt controllers might use the bus instead of dedicated wires. Operating Systems and System So ware 223 Interrupt Vector When the CPU receives the interrupt signal from the interrupt controller: 1. It immediately halts the current program 2. Switches to supervisor mode 3. Looks up the proper interrupt service routine that can process the incoming interrupt, e.g., if the interrupt comes from the hard drive, we need the code from the corresponding driver to handle it. 4. Jump to the address of the interrupt service routine and execute it This third step is usually done through an interrupt vector that holds the addresses of each interrupt handler, indexed by their IRQ number. 0 timer_irq_handler() 1 keyboard_irq_handler() 2 3 serial_irq_handler() 4 serial_irq_handler() 5 parallel_irq_handler() 6 floppy_irq_handler() 7 parallel_irq_handler() 8 rtc_irq_handler() 9 acpi_irq_handler() 10 nic_irq_handler() 11 nic_irq_handler() 12 mouse_irq_handler() 13 math_irq_handler() 14 ata_irq_handler() 15 ata_irq_handler() Operating Systems and System So ware 224 Interrupt Service Routines Now, before starting to execute the interrupt service routine (ISR) (or interrupt handler), we first need to save the state of the current program! Indeed, we need to be able to come back to it as if nothing happened when the interrupt is completed. The saved state varies depending on the CPU architecture, but it will usually contain: The program counter Some other registers, depending on the OS and architecture, e.g., stack pointer The interrupt handler also needs to signal to the interrupt controller that the interrupt has been serviced, enabling other interrupts to be forwarded. This acknowledgment may happen before the handler starts, or a er it has completed. Where do we save the state? Internal registers: Impractical as it prevents nested interrupts User stack: Easy, but if the stack is full, a page fault is triggered, with a need to save a new context again… on a full stack… Kernel stack: Less chances of having faults, but we need to perform a light context switch to another stack, potentially invalidating TLB entries, etc. Operating Systems and System So ware 225 Interrupt Masking An important feature to note is the possibility for the CPU to mask interrupts. A masked interrupt is deferred until it is unmasked, or ignored altogether. CPUs feature an internal interrupt mask register, where each bit represents an IRQ number. If bit X is set, IRQ X is disabled (masked), else it is enabled. Depending on the architecture, it could be the other way around. Some interrupts are not affected by the interrupt mask: non-maskable interrupts (NMI). They are of such a high priority that they should never be ignored. One example is the watchdog timer interrupt that periodically checks that the system is responsive. Masking interrupts is sometimes necessary when executing kernel code that cannot be interrupted for safety reasons. The kernel thus frequently becomes uninterruptible to perform such operations. Example: During a context switch, an interrupt would break the system. Operating Systems and System So ware 226 Interrupt Handling Recap So, let’s recap all the steps that happen when an interrupt occurs: 1. A device sends an IRQ to the interrupt controller Note 2. The controller forwards the interrupt to the CPU These steps may vary depending on the architecture, but this is a good general idea of how an interrupt is processed. 3. The CPU is interrupted to handle the IRQ, switching to supervisor mode a. Save the state of the user process b. Set up the context for the interrupt service routine (stack, TLB, page table, etc.) c. Acknowledge the interrupt controller to allow new interrupts d. Find the interrupt service routine in the interrupt vector and execute it e. Switch back to a user process, either by restoring the state of the previous one, or by electing a new one and context switching Step 3 may be executed in uninterruptible mode for proper operation. However, staying uninterruptible for too long may render the system unresponsive, since other devices cannot interact with the CPU. Operating Systems and System So ware 227 Divided Interrupt Handlers To avoid spending too much time in uninterruptible mode, most OSes have a divided interrupt handler design. The First-Level Interrupt Handler performs the minimal set of operations that need to happen while uninterruptible, e.g., query critical information from the interrupt controller and device controller. It should be as quick as possible to avoid making the system unresponsive. When the handler is finished, the interrupt is acknowledged and the CPU becomes interruptible again. Other names: hard interrupt handler, fast interrupt handler, upper half. The Second-Level Interrupt Handler performs all the other operations that are less critical, such as processing the data recovered from a device. They can be interrupted if needed, and continue their work later. Other names: so interrupt handler, slow interrupt handler, bottom half, deferred procedure call. Example: When a network packet arrives, the network interface controller (NIC) sends an interrupt. The First-Level handler will copy the packet from the NIC internal memory to the main memory and acknowledge the interrupt. The Second-Level handler will go through the network protocol layers, forward the data to the proper user process and wake it up. Operating Systems and System So ware 228 Example: Processing of a Hard Drive Interrupt A user program wants to read data from a hard drive. 1. The CPU issues a command to the hard drive to read a sector, then executes something else 2. The HDD controller receives the command, copies the sector into its internal memory 3. When the copy is done, the HDD sends an interrupt to the interrupt controller 4. The interrupt controller forwards the IRQ to the CPU 5. The CPU is interrupted and starts the interrupt service routine associated with the HDD controller IRQ 6. The ISR copies the HDD controller’s internal memory buffer into main memory (first-level) and queues the second-level handler 7. The CPU acknowledges the IRQ to the interrupt controller 8. The second-level handler is executed, for example performing file system operations (decryption) and copies the requested data to user space Operating Systems and System So ware 229 Programmable Interrupt Controller Modern interrupt controllers are programmable and feature much more interrupt lines. For example, Intel’s Advanced Programmable Interrupt Controller (APIC) used on x86 has 255 physical interrupt lines. While the physical interrupt lines are still usually fixed, the OS can program: The priority algorithm The routing strategy for multi-core systems You can check your interrupt controller’s configuration by: Reading the /proc/interrupts file on Linux Checking the Device Manager on Windows Operating Systems and System So ware 230 Clocks One important type of hardware interrupt are clock interrupts. They are driven by hardware clocks, or timers, that trigger interrupts at regular intervals. They can usually be configured to change the period of interrupts. In hardware, clocks are usually a small component with a counter, an oscillator (quartz), and a register. When a clock is started, the value in the register is copied into the counter. The, each time the oscillator triggers a signal, the counter is decremented. When the counter reaches 0, the clock triggers an interrupt. Clocks can run in multiple modes: One shot: When the interrupt is triggered, the clock is disabled Periodic or square-wave: When the interrupt is triggered, the value in the register is copied to the counter to restart the timer Operating Systems and System So ware 231 Clocks: Usage in the OS These hardware clock interrupts are crucial for the operating system. They are used for: Counting real time: Every time a clock interrupt is handled by the OS, it means that a fixed amount of time has passed since the last one. This allows the OS to keep track of the wall clock time, i.e., the time you see on your watch. Scheduling: The periodic clock tick calls the scheduler at regular intervals to count the amount of time a thread has been running, and drive preemption if the scheduling algorithm requires it. User-requested timers: User programs may need to wait for a given amount of time. The OS can expose such timers to allow user programs to periodically wake up/perform some operation. The timers exposed by the OS are not necessarily the hardware ones, they might be a so ware approximation. Watchdogs: These timers are used to monitor the proper behavior of the system. They are usually non-maskable interrupts that check whether the state of the system (or part of it) is correct, and tries to fix it if not. For example, Linux has a watchdog that checks that the kernel doesn’t spend to much time in uninterruptible mode. Operating Systems and System So ware 232 So Timers If the system has too many timers set, it might perform a lot of costly switches to supervisor mode to handle them. This is especially true if user programs are allowed to set such timers too frequently, triggering interrupt storms. To avoid such issues, OSes may implement so-called so timers. These timers are not directly backed by a physical hardware clock, but managed in so ware by the OS. When a user program requests a timer: The OS queues the request in a sorted list of timer requests (instead of setting up a hardware clock) Whenever kernel code is being executed, it may check if any of these requests have expired, and handle them. Most OSes do this before exiting a system call or an interrupt handler, a er a page fault or when the CPU goes idle. With so timers, latency might increase compared to regular timers, as we need to wait for the kernel to check if any so timer has expired. However, in practice, the events that trigger the kernel happen so frequently that the latency is acceptable. Operating Systems and System So ware 233 I/O So ware Operating Systems and System So ware 234 Overview Let’s have a look at how the so ware I/O stack works in most OSes: Design goals Programmed I/Os Interrupt-driven I/Os DMA I/Os So ware layers Device drivers Generic I/O so ware Operating Systems and System So ware 235 I/O So ware Design When designing the OS so ware that manages I/Os, some rules/goals need to be kept in mind. Let’s have a look at four design goals and properties that most I/O systems need to consider: Generality Synchronicity Error handling Buffering Operating Systems and System So ware 236 Design Goal: Generality User programs should be able to perform I/Os without having inside knowledge of the exact device in use. The OS can make I/Os generic by providing device independence and uniform naming. Device independence Programs are able to read from/write to any device without knowing what it exactly is. If a program reads from a regular file, it should not need to know if the file is stored on an HDD, SSD or any other type of storage device. Uniform naming The naming of the objects, i.e., files, should not depend on the device itself. The OS should provide a uniform naming scheme for files. In UNIX, all devices are integrated in the file system directory hierarchy, with every partition being mounted in a directory. Operating Systems and System So ware 237 Design Goal: Synchronicity Another characteristic of I/O operations is their synchronicity. An I/O operation can be: Synchronous (or blocking): the program waits until the I/O is completed to continue its code. Asynchronous: the program starts the operation and is then free to do something else. It then needs to either query the state of the operation for completion or expect an interrupt. The OS must make these abstractions possible whatever the underlying devices offer. As seen previously, most devices perform operations asynchronously and signal the completion of an operation with interrupts. When a synchronous operation is requested by a user program, the OS makes the requesting thread in the blocked state. In the interrupt handler processing the end of the I/O, the OS makes the requesting thread ready again. Operating Systems and System So ware 238 Design Goal: Error Handling The I/O so ware also needs to be able to handle errors that may occur on I/O devices. These errors could be mechanical, electrical or logical. Error handling happens at multiple layers: The hardware, i.e., device controllers, tries to fix errors whenever possible, e.g., disk controllers usually have error correcting algorithms, or can retry the operation. If not possible, the error is sent back to the OS through an interrupt. The interrupt service routine then decides how to handle the error, e.g., retry the operation on the device, or just pass it back to the user program. If the error goes all the way back up to user space, the program should check for errors and act accordingly, e.g., retry, crash, etc. Operating Systems and System So ware 239 Design Goal: Buffering Definition Buffering is a technique that makes use of intermediate memory buffer between the device and the user program. There can be one or more buffers located in libraries or in the kernel. Buffering can be necessary for various reasons: The OS might not know where to copy the data from a device beforehand. For example, when a network packet arrives, the OS does not know which application it should be sent to. The packet needs to be copied from the network device controller to kernel memory, parsed to find out the destination, and then copied to the appropriate process, before signaling the arrival of the packet to the process. Some libraries/OSes might introduce buffers between user programs and the actual device to avoid sending too many small requests, by bundling multiple requests in a buffer first. For example, the printf() function from the C library only performs the system call to trigger the I/O if the string contains a newline (\n) or if it is long enough. Thus, if you use this function with 1-character long strings, you will not trigger as many I/O operations. Buffering can be detrimental to I/O performance as it incurs additional data copies. Operating Systems and System So ware 240 I/O Methods There are three methods to perform an I/O operation: Programmed I/O Interrupt-driven I/O DMA I/O Let’s have a look at them! Operating Systems and System So ware 241 Programmed IOs With programmed I/Os, the CPU (usually in supervisor mode) directly controls the device by reading from/writing to its registers, and actively waits for operations to be completed before starting the next one. With this type of I/Os, devices usually expose a status register that the CPU can read to see if the device is ready for a new operation. The CPU issues an operation, then actively polls the status register until the device is ready again before sending the next operation (busy waiting). Example: A sound card The device is a character device that expects values in a register that represent an audio signal. The device checks its register at a given sampling frequency, e.g., 96 kHz, and emits the sound accordingly. With a programmed I/O, the user program builds a buffer containing a given duration of the audio to play, e.g., 1 second, and performs a system call to ask the OS to output this buffer into the sound card. The OS locks the audio device (we’re assuming only one output at a time can happen, no audio multiplexing), and runs the following loop: 1 copy_from_user(buffer, kbuf, count); // copy the user buffer in the kernel buffer kbuf 2 for (int i = 0; i < count; i++) { // for each element in the buffer (each audio sample) 3 while (*sound_card_status_reg != READY); // busy loop one the status register until READY 4 *sound_card_data_reg = kbuf[i]; // copy the i-th sample to the sound card data register 5 } Simple to implement, but the CPU cannot do something else while waiting for the next sample. Polling is great latency wise (the new sample will be written as soon as possible), but it can affect the system reactivity (the CPU is busy with this I/O for one full second!) by wasting a lot of CPU cycles. Operating Systems and System So ware 242 Interrupt-Driven IOs Programmed I/Os and busy polling can waste a lot of CPU cycles! With a 96 kHz sound card, a 1 GHz CPU, and 100 One way to avoid this waste is to use interrupts! cycles/memory access, the CPU would spend 99% of the time polling the status register! Instead of busy waiting, the CPU will schedule another thread a er starting the I/O operation. When the device is ready for the next operation, it will raise an interrupt, and the interrupt service routine will perform the next I/O operation. Starting the I/O (executed once when the user performs the system call) 1 copy_from_user(buffer, kbuf, count); // copy the user buffer in the kernel buffer kbuf 2 while (*sound_card_status_reg != READY); // busy loop one the status register until READY (only for the first I/O) 3 *sound_card_data_reg = kbuf; // copy the 0-th sample to the sound card data register 4 schedule(); // make the calling thread 'blocked' and context switch to a new thread Interrupt service routine (triggered for each sample written to the audio device) 1 *sound_card_data_reg = kbuf[i]; // copy the i-th sample to the sound card data register 2 i++; // increment i for the next interrupt 3 count--; // decrement the remaining number of elements to send to the device 4 if (count == 0) // if we are done, we can wake up the thread that requested the I/O 5 wakeup_thread(); 6 ack_interrupt(); // acknowledge the interrupt to the interrupt controller 7 return_from_interrupt(); // return from the interrupt context to the user process context No more wasted CPU cycles compared to programmed I/Os, but we sacrifice a little bit of latency for the interrupt handling + risk of interrupt storms. Operating Systems and System So ware 243 DMA-Driven I/Os To avoid having the CPU handle all these interrupts, we can use Direct Memory Access (DMA) I/Os! If the system contains a DMA controller, we can offload the whole copying process to it. Starting the I/O (executed once when the user performs the system call) 1 copy_from_user(buffer, kbuf, count); // copy the user buffer in the kernel buffer kbuf 2 configure_DMA(sound_card, kbuf, count); // configure the DMA controller to copy 'count' elements from 'kbuf' to the device 3 schedule(); // make the calling thread 'blocked' and context switch to a new thread Interrupt service routine (triggered once the DMA controller is done with the whole copy) 1 ack_interrupt(); // acknowledge the interrupt to the interrupt controller 2 wakeup_thread(); // wake up the thread that requested the I/O 3 return_from_interrupt(); // return from the interrupt context to the user process context With DMA-driven I/Os, the CPU only starts and finishes the I/O operation, all the real work is offloaded to the DMA controller. DMA performance Note that DMA controllers are usually slower than a CPU, e.g., some systems have the DMA run at 1/4th of the CPU clock speed. Thus, for high- performance applications with very fast devices, it may still be beneficial to use programmed I/Os or interrupt-driven I/Os. Operating Systems and System So ware 244 I/O So ware Layers Now, let’s have a look at the so ware organization of the I/O so ware stack. We can split this so ware into multiple layers: User space ▪ User programs: Applications that perform I/O operations directly (syscall) or through libraries ▪ Libraries: Offer an API for I/Os, usually with additional features, e.g., buffering Kernel space ▪ Generic I/O layer: Provides a single API for all I/Os (syscalls) + common features ▪ Device drivers: Provide the interface between the generic I/O layer and each specific hardware device ▪ Interrupt service routines: Provide the response to hardware events, may be part of device drivers Hardware device controllers User programs Libraries Generic I/O abstraction layer Device drivers Interrupt handlers Operating Systems and System So ware 245 Device Drivers A device driver is a piece of code that interacts directly with a specific device. It knows the set of registers exposed by the device controller and how to manipulate them. It serves as an adapter between a specific device and the generic API exposed by the OS. Device drivers are usually device-specific,.i.e, one driver per device, or device class-specific, i.e., keyboards, joysticks, etc. A single device driver may manage multiple devices of the same type or class, e.g., the SATA disk driver is used by all SATA hard drives connected. Multiple drivers may be stacked and depend on each other, particularly for some standard specifications, e.g., USB (Universal Serial Bus) devices all need the USB driver, but also a more device-specific driver, since they do completely different things (printers, keyboard, thumb drives, etc.). They are usually a part of the kernel since they need (at least partially) to run in supervisor mode to access I/O ports or memory-mapped device addresses. Depending on the kernel architecture, they can be moved to user space for all non-supervisor operations, see micro kernels. Drivers are usually provided by the device manufacturer, for each supported OS. They may also be reverse engineered by developers when the specification is not open. One current example is Asahi Linux that is reverse engineering the Apple M series based systems’ device drivers. Operating Systems and System So ware 246 Device Driver Interfaces Device drivers act as an adapter between a standard generic OS interface and the device controller interface. Most OSes have a standard interface for different classes of devices, usually block devices and character devices. Each interface contains a set of functions the driver need to implement. For example, block device drivers need to implement a read_block() function. The job of the driver is to translate generic requests such as read or write into device-specific requests. This translation process usually requires to: Translate a generic function call into a device-specific set of commands on the controller’s registers to trigger the requested operation Convert abstractions into concrete values, e.g., a disk block address must be converted into a hardware sector address The driver also needs to know how to initialize the device, e.g., power management, as well as contain some clean up code, e.g., when unplugging a device. Operating Systems and System So ware 247 Device Driver Operation Device drivers issue commands to hardware device controllers. Depending on the requested operation from the generic I/O layer and the device, the driver decides on a sequence of commands to send to the device. 1. The driver may need to prepare the data first, e.g., copy to a buffer, or do some pre-processing 2. The driver writes to the device’s control registers to issue the commands 3. (Optional) The driver may check for an acknowledgment from the device, e.g., by polling a status register 4. When the sequence of commands have been issued, the driver must wait for the commands’ completion With short commands, e.g., fully electrical, a few bytes, the driver may use polling With longer commands, the driver will most likely block and wait for an interrupt 5. When the command has been completed, the driver must check for errors If an error occurred, the driver tries to fix it or returns an error back to the generic layer If everything worked fine, the driver may do some device-specific post-processing and then give the data back to the generic layer Reentrancy Drivers may be interrupted by the device they manage, e.g., a network packet may arrive while the driver is already handling another packet. Thus, drivers must be written to be reentrant, meaning that they should expect to be called while they are already being executed. Device hot-plug Drivers must also be resilient to the fact that devices may be plugged or unplugged at any moment while they are operating. This may require clean up code to put the system back into a coherent state, e.g., properly cancel pending requests. Operating Systems and System So ware 248 I/O So ware Layers Now, let’s have a look at the so ware organization of the I/O so ware stack. We can split this so ware into multiple layers: User space ▪ User programs: Applications that perform I/O operations directly (syscall) or through libraries ▪ Libraries: Offer an API for I/Os, usually with additional features, e.g., buffering Kernel space ▪ Generic I/O layer: Provides a single API for all I/Os (syscalls) + common features ▪ Device drivers: Provide the interface between the generic I/O layer and each specific hardware device ▪ Interrupt service routines: Provide the response to hardware events, may be part of device drivers Hardware device controllers User programs Libraries Generic I/O abstraction layer Device drivers Interrupt handlers Operating Systems and System So ware 249 Generic I/O Layer The generic I/O layer is the device-independent part of the I/O so ware stack. It lies between user programs and device drivers. This layer has fuzzy boundaries both on the user side and on the device driver side. Since we cannot expect every application to have specific code for every type of device, this generic layer provides a single interface to I/O devices. In addition to providing a uniform interface, this layer also contains features that are useful for all (or many) drivers: Buffering Error management Device allocation Operating Systems and System So ware 250 Uniform Interface As we’ve seen with the design goals earlier, users need a uniform interface to access devices. The generic I/O layer provides this by enforcing an interface to device drivers, depending on their class, e.g., block or character. On the driver side, this interface enables OSes to have a large number of different drivers (and thus supported devices) without the need to modify the core of the OS code. The interface will usually contain: Initialization functions Communication mechanisms, e.g., request queues On the user side, it provides a simple and uniform way of accessing any device. The interface will contain a standard API, e.g., POSIX uses file manipulation primitives (open, read, write, etc.). In addition to the interface itself, an OS needs to be able to provide a uniform naming scheme for devices (to ease access from user code), as well as a way to match which device needs which drivers. For example, UNIX systems represent devices by a file, e.g., /dev/sda, and give each of them: A major number that will be associated to a given driver A minor number to differentiate devices using the same driver Access control Both UNIX and Windows represent devices by files. They can therefore use their access control mechanism to limit access to devices for some users, with specific permissions. Operating Systems and System So ware 251 I/O Buffering Buffering may be required for both block and character devices. Let’s take the example of a program recording audio from a microphone with different buffering schemes. 1. No buffering: The user program does blocking read operations on the device file sample by sample. Each arriving sample causes an interrupt that will wake up the user process. Overheads due to the large number of interrupts and context switches. 2. User buffering: Same design, but the user program reads multiple samples at a time, storing them in a user buffer. The interrupt wakes up the user process when the buffer is full. More efficient because less system calls and context switches, but writing across user-kernel boundary may be costly. 3. Kernel buffering: Same as 2., but the buffering is done in kernel space. When the buffer is full, it is copied to user space with a single copy. More efficient, with less copy overhead, but new samples are ignored during the buffer copy until the buffer is emptied. 4. Double buffering: Same as 3., but the kernel now has two buffers. When the first buffer is full, it is copied to user space, and the interrupt handler switches to the second buffer for new samples. By alternating between two buffers, we maintain performance while avoiding the full buffer issue. 5. Ring buffering: Same as 3., but the buffer is large and has two pointer: producer and consumer. The interrupt writes at the producer address and increments it, while the driver copies data to user space from the consumer pointer and increments it. Pointers wrap around when reaching the end of the buffer, hence the ring in the name. As long as the driver consumes data fast enough, this scheme works well. Memory buffers in device controllers are also another layer of buffering. Operating Systems and System So ware 252 Error Management and Sanity Checks I/O so ware has to properly report errors back to user space, or even fix them transparently when possible. Errors can be of two different classes: Programming errors come from the user program. They can be detected with sanity checks from the generic layer. For example, a program trying to write to a read-only device, e.g., a microphone, or providing an invalid buffer. I/O errors come from the device itself, independent of the so ware. This may be due to a mechanical problem or a configuration issue. These can usually not be solved in so ware, but the generic I/O layer may perform a retry, or try an alternate command to perform the same operation. In any case, errors that cannot be handled are passed back to higher layers. System critical errors might cause the system to terminate, e.g., the file system root is corrupted. Operating Systems and System So ware 253 Device Allocation Some devices may need to be allocated to a process for various reasons: The device cannot be used in parallel. For example, a printer cannot print two different documents at the same time. It has to be allocated to one process at a time. The OS must serialize accesses to the device to prevent wrong use. In UNIX systems, this can be done by having processes open the device file. Failing to acquire the device could block the process or return and expect the process to retry later. Some devices support parallelism by exposing multiple communication channels. The OS must allocate channels to user processes to avoid concurrency problems. For example, modern network cards expose multiple queues that can be mapped to processes, with the network controller performing the routing in hardware, without a need for the OS to do it. Operating Systems and System So ware 254 The Lifecycle of an IO As an example, let’s see what happens when a C program makes a call to the printf() function! Call to printf() User programs int printf(const char *fmt,...) { Buffering can happen // parse the fmt, replace values at multiple layers // build the buffer with the string Libraries write(1, buf, count); } user kernel write() system call: write(): - permission/error checking Generic I/O abstraction layer - clean up - copy_from_user() - error handling/return value - lookup device driver: tty_write() tty_write(): - send commands to TTY hardware tty_write(): finish the driver part of the I/O - either: Device drivers Interrupt handlers tty_isr(): - block calling thread (schedule()) - wake up the requesting thread - active polling on status register - acknowledge the interrupt The I/O is done, trigger an interrupt Hardware TTY device controller performs the I/O Operating Systems and System So ware 255 Storage Devices Operating Systems and System So ware 256 Overview Let’s have a look at a subclass of devices: block devices used for storage. We will discuss hard drives (magnetic disks) and SSDs. More specifically, we’ll have a look at their specificities and how the OS manages requests to block device controllers. Operating Systems and System So ware 257 Hard Drives: Overview Magnetic disks, also called hard drives, are mechanical storage devices. A hard drive is composed of a set of metallic platters (or disks) that contain data, and a set of heads that access the data on the platters. Track (head, cylinder) Some definitions: Cylinder: Concentrical strips of the platters (each color is a cylinder on the diagram) Track: A concentrical strip on one platter, defined by a head and a cylinder Sector Sector: A fixed-size slice of a track, traditionally 512 bytes Heads Cylinder A motor makes all platters spin permanently, while an actuator moves the heads in order to read a sector. The controller takes commands from the CPU that requires to read/write a sector following a given addressing scheme, and translates this into commands to the actuator and heads to perform the operation on the right sector. The controller is usually a microcontroller or a small CPU, coupled with some memory (tens to hundreds of MiB). Operating Systems and System So ware 258 Hard Drives: Addressing Schemes There are two main ways to address sectors on a hard drive. These addressing schemes are used in the interface with the device drivers of the OS. Cylinder-Head-Sector (CHS) Logical Block Addressing (LBA) Sectors are addressed by their coordinates in the hard drive, The hard drive exposes a linear address space of blocks to the identified by the cylinder, the head (or platter) and the sector. system. Blocks have addresses ranging from 0 to the maximum Track (head, cylinder) number of blocks minus 1. The physical topology is thus hidden from the OS. Disk rotation Sector Heads Cylinder 24 25 26 22 23 0 1 21 2 27 20 3 28 19 4 18 5 29 17 6 30 16 7 31 40 15 8 39 14 9 32 13 12 11 10 38 33 The coordinates may differ from the real hardware topology. 37 36 35 34 Hard drives may expose to the OS a different layout than the real one to simplify the interface. CHS addressing is now deprecated on modern drives on UEFI- based systems, even if disks use it internally. This is the addressing scheme used by all HDDs now. Operating Systems and System So ware 259 Hard Drives: Sectors A physical sector contains more than just the data seen by the OS. Each physical sector contains: A preamble: A given pattern to recognize a sector start, the coordinates of the sector The data: The actual content of the sector, traditionally 512 bytes long An error correcting code: Information used to check that the data has been written/read correctly Disks may also have spare sectors to replace defective ones (easy with LBA) The arrangement of sectors on disk (the layout) can also vary to improve performance. Cylinder skew: Sector 0 of each cylinder may be offset to allow switching tracks without missing sectors, since the platters never stop spinning, and moving the head to a different cylinder takes time, this avoids waiting for a full revolution to continue reading. Disk rotation Example: We have a cylinder skew of 3 sectors, which is enough if moving the head between two tracks takes less than 3 sectors to pass by. Operating Systems and System So ware 260 Hard Drives: Properties Hard drives have some important physical properties that impact their performance. Seek time: Time for the head to move between cylinders. This is relatively long, hundreds of microseconds. Rotational delay: Delay for the wanted sector to arrive under the head. This depends on the rotational speed of the disk, i.e., modern disks can rotate at thousands of revolutions per minute (RPM), and on the number of sectors per track. Transfer time: Time to transfer data from the sector to the controller through the head. Track (head, cylinder) The operating system can optimize the I/O performance by ordering the requests to the I/O device controller. That’s the job of the I/O scheduler. Sector Heads Cylinder Operating Systems and System So ware 261 I/O Scheduling The operating system can re-order the I/O requests to a storage device in order to optimize performance and/or reduce access latency. The ordering algorithm is called the I/O scheduler. It can additionally optimize other metrics such as the fairness between processes. Various algorithms exist, but let’s focus on three simple I/O schedulers that optimize performance and latency: First-Come First-Served Shortest Seek First The Elevator Algorithm Operating Systems and System So ware 262 I/O Scheduler: First-Come First-Served Requests are served in the order they arrive. Example: We have 10 requests queued (request number in the blue square), and the initial position of the head is X. We need 108 cylinder changes to perform all operations. Cylinder 0 8 16 24 31 3 8 1 2 6 7 10 4 9 5 1 8 8 19 9 Head 4 movements 11 16 22 10 Pros Cons Easy to implement Doesn’t account for the disk physical properties High amount of head movement, especially with random accesses Optimization: The OS can maintain a per-cylinder list of requests, and make all requests on the way of the next one to serve. In our example, when going to serve request 1, the I/O scheduler would also request 6 and 2 on the way. Operating Systems and System So ware 263 I/O Scheduler: Shortest Seek First Requests are ordered by their distance from the current position of the head. The goal is to minimize head seek times, as this is the most costly operation. Example: We have 10 requests queued (request number in the blue square), and the initial position of the head is X. We need 52 cylinder changes to perform all operations. Cylinder 0 8 16 24 31 3 8 1 2 6 7 10 4 9 5 1 1 2 4 7 Head 1 movements 2 5 28 1 Pros Cons Relatively easy to implement Possible starvation: if new requests close to the head keep coming Head movements are drastically reduced, latency between served in, distant requests may never be served requests is low Central cylinders have a higher chance of being served, impacting fairness Optimization: The OS can maintain an expiration date for each request, and Operating Systems serve and System them whatever the head position if they have waited for too long. So ware 264 I/O Scheduler: Elevator Algorithm Same algorithm as elevators in tall buildings. The I/O scheduler serves all the requests in one direction, e.g., towards the center, then changes direction, and serves all requests, repeatedly. Also called SCAN algorithm. Example: We have 10 requests queued (request number in the blue square), and the initial position of the head is X. We need 41 cylinder changes to perform all operations. Cylinder 0 8 16 24 31 3 8 1 2 6 7 10 4 9 5 1 1 2 8 Head 1 16 movements 4 1 2 5 Pros Cons Reduced number of head movements Usually worse than SSF (not in this example) Requests have an upper bound for seek time, at most twice the number of cylinders The SCAN algorithm has some variations, depending on the limitsOperating of the head movements. Systems and System So ware 265 I/O Scheduler: Elevator Algorithm Variants There are four main versions of the elevator algorithm: SCAN: The head moves from the first to the last cylinders before changing direction LOOK: The head changes direction if there are no more requests to serve in the current direction C-SCAN: The head serves requests from the first to the last track, then goes back to the first track and starts again C-LOOK: Same as LOOK, but the head jumps back to the request in the lowest cylinder a er reaching the last request Operating Systems and System So ware 266 I/O Scheduler and Logical Block Addressing All the I/O scheduling algorithms we just described require the OS to have knowledge on the hard drive topology. We need to know which cylinder contains which sector to re-order requests. However, with Logical Block Addressing (LBA), the logical topology exposed to the OS might be different from the actual physical topology! If these topologies differ, I/O scheduling by the OS is useless, and could even be detrimental to performance. Modern disks o en have very complex physical topology that they hide from the OS with LBA. With these disks, I/O scheduling algorithms are implemented directly in the controller, who has knowledge of the physical topology. Controllers expose request queues, which means that the OS can make multiple requests without waiting for completion. The OS may either just use FCFS and let the disk controller do the scheduling, or have its own I/O scheduler that doesn’t optimize seek times, but so ware metrics, such as fairness between processes. For example, Linux does this with the Budget Fair Queueing (BFQ) I/O scheduler. Operating Systems and System So ware 267 I/O Schedulers: Targeting So ware Metrics Most OS-level I/O schedulers don’t try to optimize low-level metrics anymore, since the hardware will do better itself. They try to optimize metrics such as: Fairness: Each process gets a fair amount of I/O bandwidth. Fair doesn’t mean equal, processes may have different priorities, e.g., see the ionice command in Linux to manipulate a process’ I/O priority. Latency: Requests may also be sorted by some latency priority, and submitted in that order. That doesn’t offer full guarantees since the disk controller might still do some re-ordering a erwards. I/O schedulers in the wild Linux’ I/O scheduler is the Budget Fair Queueing (BFQ) scheduler, aiming for fairness Windows (until Vista) treated all I/Os equally Windows (since 7) has priority queues, i.e., all critical I/Os are processed before all high priority I/Os, then normal priority, then low priority. Mac OS X (at least until 10.4) has no I/O scheduler. Some requests may be deferred to be merged together. Operating Systems and System So ware 268 Error Correcting Codes Earlier, we saw that sectors contained some bytes for error correcting codes (ECC). Manufacturing processes are not perfect, which means that hardware have defects. This is especially true with high density hard drives that store a lot of bytes per cm on the platters. These defects can cause errors when reading or writing on a disk. To detect and fix these errors, hard drives use error correcting codes (ECC) on each sector. These algorithm are able to fix small errors (a few bits) by storing a few bytes of ECC. It is not uncommon to have 16 bytes of ECC per sector on recent disks. We won’t go into details on how these algorithms work exactly. This is an operating systems course, not an algorithmic course. If the error cannot be fixed, the sector might have a permanent defect, and cannot be used anymore. Operating Systems and System So ware 269 Faulty Sector Replacement When a sector is defective, the disk controller can mark it as a bad sector and stop using it. With CHS addressing, the controller needs to inform the OS to stop using these bad sectors. With logical block addressing, devices can hide bad sectors from the OS by replacing them with working ones. The disk contains a set of spare sectors that can be used to this end, and the controller updates the logical-to-physical block mapping. There are two main remapping schemes: Shi ing sectors Remapping sectors Disk rotation Disk rotation Bad sector Bad sector Sector shift Spare blocks Spare blocks Remapped sector Operating Systems and System So ware 270 Solid State Drives Solid state drives (SSDs) are flash-based storage devices that store data in semiconductor cells, with no moving parts. They are much faster than HDDs and support parallelism by having multiple request queues that can work in parallel. SSDs have very smart controllers that perform a lot of pre- and post-processing. Multiple queues support is available when using the Thus, OSes usually do not do any low-level I/O scheduling for them. NVMe interface. The SATA interface only supports a single queue since it was designed with HDDs in SSD controllers usually support: mind, that are limited by the head. Wear-leveling: SSD cells support a limited amount of write operations. The controller remaps cells to avoid future access issues. Smart caching and prefetching: SSDs feature large caches to avoid performing operations on actual cells Encryption A note on network devices Network devices show some similarities with SSDs, as they also expose multiple queues, and perform some operations on-device. Operating Systems and System So ware 271 User Interface Devices Operating Systems and System So ware 272 User Facing Devices Most general purpose computer systems have user facing devices for human interaction. Usually, a monitor and a keyboard (and a mouse). We will have a look at input devices first (keyboard and mouse), and then output devices (monitors). Operating Systems and System So ware 273 Input Devices: PS/2 Interface Up until a couple of decades ago (and USB), keyboards and mice were connected through a PS/2 port, if you have a desktop computer, you most likely still have these ports but never use them. The PS/2 interface supports serial communication protocols. Devices drivers for PS/2 devices can use one of these two models: Programmed I/Os: the driver actively polls the status register of the device until a value is available, then reads the data register Interrupt-driven I/Os: the device sends an IRQ when data is available, the interrupt service routine reads the data from the data register of the device Most PS/2 drivers use the latter model. The driver reacts to key presses or mouse movement through interrupts. For keyboards, every modification of the state of a key, e.g., pressing or releasing, sends an interrupt. For mice, they periodically send the movement ( Δx, Δy) and clicks since the last interrupt, at a given frequency, e.g., 1000 Hz. Operating Systems and System So ware 274 Input Devices: USB Interfaces Most keyboards and mice now use a USB (Universal Serial Bus) interface. USB communication happens through logical pipes. These pipes can be of different types: Message pipes: Bidirectional, used for control messages, with a limited bandwidth (max 64 bytes/packet) Stream pipes: Unidirectional, used for data transfers. They support three modes: ▪ Isochronous transfers: For real-time data, provide a guaranteed, fixed bandwidth ▪ Bulk transfers: For regular data, no guarantees, but max bandwidth ▪ Interrupt transfers: For periodic data transfers, with latency guarantees Input devices with USB usually set up stream pipes in interrupt transfer mode. In this mode, the host USB controller periodically queries the state of the device and triggers an interrupt to inform the CPU. The period is decided between the host controller and the device. Let’s have a look at how USB keyboard work. Operating Systems and System So ware 275 Input Devices: USB Keyboards The communication protocol is the following: 1. The host USB controller periodically polls the USB keyboard for a report that contains a set of key presses/releases If the report is different from the previous one, the USB controller triggers an interrupt and acknowledges the keyboard Else, the USB controller just acknowledges the keyboard until the next report 2. The driver, in the interrupt service routine, reads the report and perform the necessary actions, e.g., copy in a buffer, trigger some application shortcut, etc. 3. The driver acknowledges the interrupt and switches back to the previous user thread Keyboard report: Reports are a well formed set of up to 8 bytes: Byte 0: Information about modifier keys (shi , ctrl, etc.) Byte 1: Reserved Bytes 2–7: Each byte contain the scan code of a pressed key, ordered by time of arrival, a report support at most 6 simultaneous keystrokes Scan codes are hexadecimal values uniquely assigned for each key. Operating Systems and System So ware 276 Input Devices: Keyboard Drivers Keyboard drivers have two main jobs: Implement the interrupt service routine that reads the report from the USB controller. This is pretty straightforward, as it only requires to copy the data from the proper USB communication pipe. Process the information from the report and pass it back to the proper process(es): ▪ Scan codes are mapped to keys, not ASCII characters, which means they are not necessarily usable as is by programs. POSIX provides two input modes: ◦ Raw mode (or noncanonical mode): Keystrokes are directly passed back to applications that will decide what to do for each one. Used by applications that implement low level commands and editing facilities. ◦ Cooked mode (or canonical mode): The driver buffers keystrokes in a buffer, and returns a full string to applications. The driver therefore needs to convert scan codes into ASCII characters, as well as do some inline processing, e.g., if you press backspace, the driver will edit the buffer directly. Used by most applications that just process text. ▪ People are used to have visual feedback of what they are typing on the monitor. The driver needs to output the current state of the buffer, figuring out where to display it, how to interleave it with other outputs, etc. Operating Systems and System So ware 277 Output Devices: Terminals Terminal devices provide text-only output. They are relatively simple from a driver’s perspective, as it only needs to output ASCII characters on screen. Terminals usually use a serial communication protocol, where the driver just writes characters, and the device displays them on screen. In practice, we also need some special commands to be able to overwrite characters, erase lines, etc. The ANSI standard defines a set of special commands, represented by escape sequences, that can be sent to the terminal controller. An escape sequence starts with a special ASCII character (ESC → 0x1B) and is followed by a sequence of characters. Example of ANSI commands: ESC [nA : Move cursor up n lines ESC [nM : Delete n lines at cursor ESC [nB : Move cursor down n lines ESC [nP : Delete n characters at cursor ESC [nC : Move cursor right n characters ESC [sK : Clear line from cursor according to s (0: to end, 1: from start, 2: all) ESC [nD : Move cursor le n characters ESC [nm : Enable style n (0: normal, 4: bold, 5: blink, 7: reverse) Operating Systems and System So ware 278 Output Devices: Graphical Interfaces Most modern computers do not use terminals as output devices, but offer full graphical interfaces. They support arbitrary type of visual information, not only characters and text. Such Graphical User Interfaces (GUI) need special hardware to be displayed: graphics cards. Powerful graphical processing units (GPUs) might have more advanced interfaces, but generally, graphical rendering is done through a frame buffer. The frame buffer can be seen as a rectangular area (representing the screen), where each pixel is represented. 1. The OS driver/graphical stack “draws” whatever needs to be displayed in the frame buffer 2. The device controller periodically copies the content of the frame buffer onto the monitor The frequency depends on the monitor frequency, i.e., a 60 Hz monitor will perform 60 copies per second Screen tearing Screen tearing happens when the frame buffer is being updated at the same time as it is copied to the monitor. Thus, the monitor may show parts of two different frames at the same time. This can be avoided by using multiple buffers as well as with device synchronization. Operating Systems and System So ware