Full Transcript

347 Chapter 8: Virtualization Operating Systems and System So ware 348 Overview Let’s assume I am a student at RWTH, and I am takin...

347 Chapter 8: Virtualization Operating Systems and System So ware 348 Overview Let’s assume I am a student at RWTH, and I am taking the (really nice) Operating Systems course. I regularly have to complete programming assignments that expect me to use Linux. However, I run Windows on my machine because I still want to be able to play video games! So, I want to keep using Windows, but I need Linux if I want to pass the course. Help! No worries, this is perfectly possible with virtualization! In this chapter, we’ll have a look at: What is virtualization? How is physical hardware virtualized? Why is this relevant in cloud architectures? Lightweight virtualization with containers Operating Systems and System So ware 349 Background Definitions Virtualization is a technique that gives the illusion of running multiple (virtual) machines on a single (physical) machine. The Virtual Machine Monitor (VMM), or hypervisor, is the so ware component that creates this illusion, enabling multiple virtual machines (VMs) to run potentially very different operating systems on the same machine at the same time. Note: The hypervisor is also called the host while VMs are called guests. VM VM Hypervisor Hardware Advantages of virtualization: Safety: If a VM crashes, the others are not affected Consolidation: Multiple VMs can be co-located on the same machine, reducing resource waste, energy consumption Migration: VMs can easily be moved from a machine to another (simply copy the memory) Operating Systems and System So ware 350 Requirements Virtual machines should be able to boot and run transparently as if alone on a physical machine, with their own operating system. To enable this, the hypervisor must perform well in three dimensions: 1. Safety: The hypervisor has full control over virtualized resources. This enables a strong isolation between VMs, preventing them from performing privileged operations that could affect other VMs on the system. This is analogous to how operating systems must enforce isolation and safety between processes. 2. Fidelity: A program executed in a VM should have the exact same behavior as on a non-virtualized system. 3. Efficiency: Most of the code executed in the VM should run without the hypervisor intervening, like on bare hardware. These requirements can be fulfilled with so ware-based or hardware-based solutions. Operating Systems and System So ware 351 Hypervisor Technologies So ware-based techniques: Interpretation: Similar to a shell interpreter, every instruction executed by the VM is read by the hypervisor, which performs the operation, and then goes to the next instruction. This is obviously not efficient, as every instruction in the VM might require multiple instructions in the hypervisor to be interpreted. Emulation: When starting a VM, the hypervisor rewrites privileged instructions in the VM code into safe code sequences. The other instructions are executed natively. This is also called binary translation. Note that this technique can also be used to emulate a different CPU architecture, e.g., an Android emulator to help develop mobile applications. Hardware-based techniques: Trap-and-emulate: VMs are executed normally, except when trying to use privileged instructions that could break the safety property. If they do so, a trap, i.e., a so ware interrupt, is triggered, allowing the hypervisor to perform the operation safely, emulating its behavior. Paravirtualization: The hypervisor exposes to VMs that they run in a virtualized system. This way, they can also provide an API to perform privileged operations. This interface is composed of hypercalls, i.e., the virtualized counterpart to system calls, and potentially shared memory and data structures. Both techniques usually use an additional processor privilege level, i.e., user, supervisor and hypervisor to enforce protection across domains and trigger traps or hypercalls when needed. Ring 3 Ring 2 Ring 1 Ring 0 Operating Systems and System So ware 352 Hypervisor Types There are two main approaches for virtualization: Type 1 (bare metal) Type 2 (hosted) The hypervisor executes on the bare hardware, in the most privileged The hypervisor is a program running on a regular operating system, protection domain, while VMs run on top of the hypervisor, in least e.g., Linux, Windows, with VMs spawned as regular user processes. privileged domains (usually two domains: one for the guest kernel, The hypervisor still needs to run in privileged level, while VMs run as and one for guest applications). regular user processes. Examples: Xen, VMware ESXi, Hyper-V, PikeOS Examples: QEMU, VirtualBox, Parallels, VMware, KVM in Linux User processes Guest processes Host processes Ring 3 Ring 3 Ring 0 VM VM VM VM Hypervisor Ring -1 Hypervisor Ring 0 Host OS Hardware Hardware Operating Systems and System So ware 353 Hardware Virtualization In both types of hypervisors, VMs should execute as if they were on bare metal hardware. They shouldn’t even notice that they are running as a VM, except if using paravirtualization. Since hardware resources must be shared between VMs, we need the hypervisor to allocate them. Note that this is a similar problem as with any operating system and its user processes. However, the slight difference is that hypervisor need to expose hardware to VMs as if it was real hardware. Thus, hypervisor usually virtualize resources, partitioning them temporally or spatially. Let’s have a look at three resources: CPU Memory Devices Operating Systems and System So ware 354 CPU Virtualization In addition to the handling of privileged instructions we’ve discussed before, CPU time must also be allocated to VMs by the hypervisor. There are two main strategies: Static allocation Each VM is allocated a set of CPU cores, and can use them as it wishes. The total number of CPU cores used by the VMs cannot exceed the total VM VM VM number of CPU cores available on the physical machine. CPU CPU CPU CPU CPU CPU CPU CPU Virtual CPUs The hypervisor exposes a set of virtual CPUs, vCPUs, that are then scheduled on physical CPUs, pCPUs. There can be more vCPUs than pCPUs, with vCPUs being scheduled in turn, similar to processes in an OS. The hypervisor’s scheduler decides which vCPU runs on which pCPU, when, and for how long. VM VM VM vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU pCPU pCPU pCPU pCPU pCPU pCPU pCPU pCPU Operating Systems and System So ware 355 Memory Virtualization Guest VMs run as if they were directly running on bare hardware. Thus, they usually implement virtual memory through paging, translating virtual addresses into physical addresses. However, with virtualization, we may now have multiple VMs, each managing their page tables independently. Which means that you may have conflicting decisions, e.g., two VMs mapping the same page frame. To prevent mapping issues and preserve isolation, hypervisors need to manage physical memory and mediate between VMs. To do so, the hypervisor needs to be able to track the guest VM allocations. This is done by adding a shadow page table for each guest in the hypervisor that maps the guest pages into host pages. The hypervisor needs to keep the shadow page table of a guest synchronized with the actual page table of the guest! To synchronized both page tables, hypervisors can use different techniques: Protect the guest page table as read-only, in order to trigger page faults when the guest page table is modified. The hypervisor can then update the shadow page table and resume the guest execution. Leave the guest page table unprotected, and: ▪ Intercept page faults on newly allocated pages to update the shadow page table (by inspecting the guest page table), ▪ Intercept instruction that invalidate TLB entries in order to catch mappings being removed from the guest page table. We won’t go into more technical details as that would warrant at least one hour of complex explanations. Operating Systems and System So ware 356 Memory Virtualization: Hardware Support Managing a shadow page table means triggering a large number of costly page faults and so ware page table walks. Faults are expensive in general, but in a virtualized environments, it is even more costly to perform a VM exit, i.e., trapping from the guest to the hypervisor. To reduce this cost, CPU vendors added support for virtualization in hardware to accelerate these management operations. The support for nested page tables enables hypervisors to manage guest memory allocations without systematic faults and so ware page table walks. With this technology, whenever a guest accesses a guest virtual address, the MMU walks the guest page table to translate into a guest physical address. The MMU is then gonna page walk the shadow page table to find the host physical address. During VM switching, i.e., a context switch between VMs, the hypervisor just needs to switch the guest page table pointer, similar to how an OS changes the page table pointer when switching between processes. Again, we won’t delve into more details as the technical implementation is very complex. Operating Systems and System So ware 357 Device Virtualization Devices have similar sharing constrains as in traditional operating systems. There are various techniques to manage devices in hypervisors, and they can be combined on a per device basis. Emulation The guest operates normally, through a device driver. Every access to the real device is VM VM then trapped by the hypervisor (through faults) to emulate the behavior of the device, Driver Driver and communicate with the actual device through another driver. Trap & Emulate Note that the driver in the guest interfaces with a virtual device interface provided by Hypervisor the host, not the real device, which can have a completely different interface! Driver Device Paravirtualization The driver in the guest is aware of being virtualized, and communicate with optimized commands with the host driver. This drastically reduces the number of traps and the VM VM cost of emulation. PV Driver PV Driver Hypervisor Driver Device Operating Systems and System So ware 358 Device Virtualization (2) Passthrough VM VM The hypervisor maps a device to a single guest, which can then directly interact with it with its driver. Driver Aside from the initial setup, the hypervisor is not involved in the communication between the guest and the device. Hypervisor This scheme allows to reach native device performance, but prevents sharing between guests. Device Mediation This approach combines emulation and passthrough: Emulation is used for device control and configuration, through the trap and emulate method, e.g., mapping a network queue in memory VM VM Passthrough is used for the data flow, e.g., copying the content of network packets Driver Driver The hypervisor can then allocate the device context to different VMs when necessary, Trap & Emulate while maintaining near-native performance for the actual data-related operations. This Hypervisor is particularly efficient when device control and configuration is rare compared to data Control Driver operations. Device Operating Systems and System So ware 359 Virtualization and Deployment Models Virtual machines are extensively used to ease the deployment of applications: In public cloud infrastructures, e.g., Amazon Web Services, Google Cloud, Microso Azure, etc. In on-premise infrastructures, i.e., private cloud. Thanks to the isolation and compartmentalization offered by VMs: VMs can be deployed on any machine on top of a hypervisor, VMs can be moved around and migrated between machines/cloud providers, etc. Resources, e.g., CPU, memory, devices, can be increased/reduced for cost/performance reasons A system composed of multiple VMs can be scaled up/down easily Cloud providers can overcommit resources, e.g., CPU, memory, to avoid waste and pool resource between tenants more efficiently Operating Systems and System So ware 360 Containers A slightly different virtualization technique is containerization. Containers, or jails, are similar to VMs, except that the illusion of a machine is replaced by an isolated user space environment. Containers are isolated from the rest of the system in terms of resource access. For example: They can use only a subset of CPU time, memory, etc. Ring 3 They only see a subset of the file system, i.e., only part of the directory tree Container engine They only see the processes inside the container, Ring 0 Host OS e.g., processes in a container cannot send a signal to a process outside The Linux kernel enables the implementation of containers through its cgroup subsystem. Hardware Operating Systems and System So ware

Use Quizgecko on...
Browser
Browser