Fundamentals of Cloud Computing Lecture 2 PDF
Document Details
Uploaded by Deleted User
Raja Appuswamy
Tags
Summary
This document is a lecture covering fundamentals of cloud computing, including resource sharing, and virtualization. It explains different types of hypervisors and the implementation of virtualization through interpretation, dynamic binary translation, and hardware-assisted methods.
Full Transcript
Fundamentals of Cloud Computing Lecture 2 Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Sharing Resources Economics of cloud computing requires resource sharing How do we share a physical computer among multiple...
Fundamentals of Cloud Computing Lecture 2 Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Sharing Resources Economics of cloud computing requires resource sharing How do we share a physical computer among multiple applications? Application Application Application Application WebServer WebServer WebServer WebServer OS OS OS OS Hardware Hardware Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Virtualization: An Abstraction Introduce an abstract model of what a generic computing resource should look like – a “virtual machine” (VM) CPU -> virtual CPU Application Application Application Disk -> virtual Disk NIC -> virtual NIC WebServer WebServer WebServer Provide one such “virtual machine” OS OS OS per tenant & host multiple virtual machines on the same physical VM VM VM machine Hardware Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Virtual machine and Hypervisor Virtual Machine: “A fully protected and isolated copy of the underlying physical machine’s hardware.” -- definition by IBM Virtual Machine Monitor (aka VMM, aka Hypervisor): “A thin layer of software that's between the hardware and the Operating system, virtualizing and managing all hardware resources” Two types of hypervisors Type 1: VMM runs directly on physical hardware Type 2: VMM built on top of a host OS Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Type 1 Hypervisor VMM directly implemented on physical hardware VMM performs scheduling and allocation of resources Eg: VMWare ESX Server Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Hyper-V example Security: Isolation between VMs is useful if VMs don’t trust each other, and/or host doesn’t trust VMs Ex: In Windows 10 Enterprise, navigating to an untrusted website will open the web browser inside of a Hyper-V VM “When an employee browses to a site that is not recognized or trusted by the network administrator, Application Guard steps in to isolate the potential threat... Application Guard creates a new instance of Windows at the hardware layer, with an entirely separate copy of the kernel and the minimum Windows Platform Services required to run Microsoft Edge. The underlying hardware enforces that this separate copy of Windows has no access to the user’s normal operating environment.” Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Type 2 Hypervisor VMMs built completely on top of a host OS Host OS provides resource allocation and standard execution environment to each “guest OS” Example: User-mode Linux (UML), QEMU Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems How Can We Implement Virtualization? Interpretation Popek-Goldberg VMMs Dynamic Binary Translation Hardware-assisted Virtualization Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems CPU Organization Basics Instruction Set Architecture (ISA) Defines: the state visible to the programmer registers and memory the instruction that operate on the state ISA typically divided into 2 parts User ISA: Primarily for computation System ISA: Primarily for system resource management Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems User ISA: State Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems User ISA: Instructions Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems System ISA Privilege levels Control registers Traps and Interrupts MMU Page tables Translation Lookaside Buffer I/O device access Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems How Can We Implement Virtualization? Interpretation Popek-Goldberg VMMs Dynamic Binary Translation Hardware-assisted Virtualization Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Virtualization with interpretation Fetch/Decode/Execute pipeline in software CISC instruction: incl(%eax) int *ip; …. ++(*ip); Interpreting it as a C function r = GPR[EAX]; tmp= read_mem(r); tmp++; write_mem(r, tmp); Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Virtualization with interpretation: Pros and Cons Good: Easy to interpose on sensitive operations The guest OS will want to read and write sensitive registers (e.g., %gs, %cr3), send commands to IO devices (e.g., inb), etc. The interpreter can handle sensitive operations according to a policy Ex: All VM disk IO is redirected to backing files in the host OS Ex: VM cannot access the network at all, or can only access a predefined set of remote IP addresses Good: Provides “complete” isolation (no guest instruction is directly executed on host hardware) Good: Virtual ISA can be different than the physical ISA Bad: Accurately emulating a modern processor is difficult Bad: Interpretation is slow! [Ex: Bochs is 2x—3x slower than direct execution] Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems What If We Assume That The Virtual ISA Is The Same As The Physical ISA? Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems How Can We Implement Virtualization? Interpretation Popek-Goldberg VMMs Dynamic Binary Translation Hardware-assisted Virtualization Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Virtualization: A bit of history (1960s, 70s) IBM sold big mainframe computers Companies could afford one Wanted to run apps designed for different OSes. Idea: add a level of indirection! IBM System/370 CP/CMS (Control Program/Cambridge Monitor System), 1968 Time sharing providing each user with a single-user OS Allow users to concurrently share a computer Hot research topic in 60s-70s; entire conferences devoted to VMs “Formal Requirements for Virtualizable Third Generation Architectures” Popek & Goldberg, 1974 Set of conditions sufficient for a computer architecture to support system virtualization efficiently Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Popek and Goldberg: Terminology Privileged instructions cause a trap if the processor isn’t in privileged mode invlpg: Invalidate a tlb entry mov %crXXX: Write to a control register inb: Read a byte of data from an input port Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Popek and Goldberg: Terminology Privileged instructions cause a trap if the processor isn’t in privileged mode Sensitive instructions access low-level machine state (e.g., page tables, IO devices, privilege bits) that should be managed by control software (i.e., an OS or a VMM) Safe instructions are not sensitive A VMM is a control program that is: Efficient: All safe guest instructions run directly on hardware Omnipotent: Only the VMM can manipulate sensitive state Undetectable: A guest cannot determine that it is running atop a VMM Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Trap & emulate approach “A processor or mode of a processor is strictly virtualizable if, when executed in a lesser privileged mode: all sensitive instructions that modify system state must be privileged all instructions that access privileged state trap As all sensitive instructions behave nicely, the VMM can trap and emulate every one of the sensitive instructions. Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Virtualization fades away (1980s) Interest died out in 80s More powerful, cheaper machines Could deploy new OS on different machine More powerful OSes (UNIX, BSD, MINIX, Linux) No need to use VM to provide multi-user support Ken Thompson, Dennis Richie Andy Tanenbaum Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems New Beginnings (1990s) Multiprocessor in the market Innovative Hardware Hardware development faster than system software Customized OS are late, incompatible, and possibly buggy Commodity OSes not suited for multiprocessors Do not scale due to lock contention, memory architecture Do not isolate/contain faults; more processors, more failures Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems It’s Disco time Disco: Running Commodity Operating Systems on Scalable Multiprocessors, Edouard Bugnion, Scott Devine, and Mendel Rosenblum, SOSP’97 Idea: Insert a software layer -- Virtual Machine Monitor -- between hardware and OS running commercial OS Virtualization: Used to be: make a single resource appear as multiple resources Disco: make multiple resources appear like a single resource Central problem: Not all architectures are strictly virtualizable X86 had a big problem Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Theorem: A virtual machine monitor can be constructed for architectures in which every sensitive instruction is privileged. For many years, x86 had sensitive instructions that weren’t privileged! Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems The x86 push instruction was not PG-virtualizable! push can push a register value onto the top of the stack %cs register contains (among other things) 2 bits that represent the current privilege level A guest OS running in Ring 3 could push %cs and see that the privilege level isn’t Ring 0! To be virtualizable, push should cause a trap when invoked from Ring 3, allowing the VMM to push a fake %cs value which indicates that the guest OS is running in Ring 0 Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems The x86 pushf/popf instructions weren’t PG-virtualizable! pushf/popf read/write the %eflags register using the value on the top of the stack Bit 9 of %eflags enables interrupts In Ring 0, popf can set bit 9, but in Ring 3, CPU silently ignores popf! To be virtualizable, pushf/popf should cause traps in Ring 3 Allows VMM to detect when guest OS wants to changes its interrupt level (meaning that the VMM should change which interrupts it forwards to the guest OS) x86 had 17 non-PG-virtualizable instructions! Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems How Can We Handle Non-virtualizable Processors? Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems How Can We Implement Virtualization? Interpretation Popek-Goldberg VMMs Dynamic Binary Translation Hardware-assisted Virtualization Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems mov 32(%rbp), %rbx Dynamic Translation sub %rbx, %rcx Original Suppose that: push %rcx VMM runs in Ring 0 popf //Sensitive instruction, guest OS Guest OS and guest apps run in Ring 3 //but not privileged! code Allow guest user-level programs to run //Won’t trap when called directly atop hardware... //in Ring 3.... but the VMM rewrites the guest OS’s binary on-the-fly! Translate sensitive-but-unprivileged instructions into ones that trap to the VMM mov 32(%rbp), %rbx Translation starts when the VMM initially sub %rbx, %rcx loads the guest OS binary push %rcx Translator handles a few instructions at a int3 Translated time (no bigger than a basic block) //A special one-byte guest OS VMM caches frequently-used translations to //instruction that traps. code amortize translation costs //Originally intended for //use by debuggers to set //breakpoints. Cloud Computing and Distributed Systems Raja Appuswamy (Eurecom) Disco Extend modern OS to run efficiently on shared memory multiprocessors with minimal OS changes A VMM built to run multiple copies of Silicon Graphics IRIX operating system on Stanford Flash multiprocessor IRIX Unix based OS Stanford FLASH: cache coherent NUMA Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Disco to VMWare Mendel Rosemblum Started by creators of Disco (Stanford University) Initial product: provide VMs for developers to aid with development and testing Can develop & test for multiple OSes on the same box Actual, killer product: server consolidation Enable enterprises to consolidate many lightly used services/systems Cost reduction, easier to manage Eventually over 90% of VMWare’s revenue Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Virtualization with dynamic binary translation Translate each guest instruction to the minimal set of host instructions required to emulate it Advantages Avoid function-call overhead of interpreter-based approach Can re-use translations by maintaining a translation cache Disadvantage: Still slow than direct execution Done by QEMU Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems How Can We Implement Virtualization? Interpretation Popek-Goldberg VMMs Dynamic Binary Translation Hardware-assisted Virtualization Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Intel VT-x Support for virtualization in hardware for x86 (Intel VT-x and AMD-V (2008)) VT-x Creates two new privilege modes (root and non-root) that are orthogonal to ring privileges VMMs run in root mode VMs run in non-root mode Defines a VMCS (VM Control Structure): A 4KB chunk of memory that contains a VM’s register state Gives each CPU a VMCS pointer register (only configurable by root- mode software) Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems VT-x VMM launches a VM by: Configuring the VM’s VMCS page in memory Setting a core’s VMCS pointer to point to the VMCS page Calling vmlaunch/vmresume, which context switches to a VM by copying the VM’s register state from the VMCS page into the CPU’s registers A VM relinquishes control to the VMM: By voluntarily invoking vmexit or vmcall By invoking a sensitive instruction When an interrupt occurs In all cases, the VM’s register state is saved in the VM’s VMCS page During a mode switch, hardware automatically and atomically saves register state! VMM->VM: The VM’s register state is loaded from the VMCS onto the CPU, and the VMM’s state is saved in the VMCS VM->VMM: The VMM’s register state is loaded from the VMCS onto the CPU, and the VM’s state is saved in the VMCS Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Virtualizing other aspects We did not cover Memory, I/O, storage virtualization, Para-virtualization, nested virt., … Check book “Hardware and Software Support for Virtualization” https://web.archive.org/web/20180525151921/https://saferwall.com/blog/virt ualization-internals-part-1-intro-to-virtualization https://cseweb.ucsd.edu/~jfisherogden/hardwareVirt.pdf Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems From VMs to Containers Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Is full hardware emulation the only option for virtualization? Application Application Application Application Application Application WebServer WebServer WebServer WebServer WebServer WebServer OS OS OS VE1 VE2 VE3 VM1 VM2 VM3 OS VMM Hardware Hardware Can we raise the abstraction one level? Can we support OS-level virtualization to create “Virtual Environments”? Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Use case for OS-level virtualization Hosting providers & PaaS providers Need to host multiple applications/tenants on a single server Full hardware virtualization still expensive Full OS + libraries + application per tenant => reduced multitenancy Use of VMMs means license fee OSes have always supported multitenancy Multiple processes share same OS Virtual memory & file systems for sharing memory and storage Can we extend OS to securely isolate multiple applications? Observe and control resource allocation across groups of processes Limit visibility and communication across groups of processes Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Linux kernel functionality to support virtualization Cgroups (control groups) Metering and limiting resources used by a group of processes Ex: Partitioning resources in a server so that Memory: Researchers: 40%, Professors: 40%, Students: 20% CPU: Researchers: 50%, Professors: 30%, Students: 20% Network: WWW browsing (20%), NFS (60%), Others (10%) Namespaces: Limiting what processes can view Abstract a global system resource and make it appear as a separated instance to processes within a namespace. Just like what a process within a VM can “see” Example namespaces Net: Each process group gets its own net interfaces, routing tables, Pid: Processes can only see other processes in same PID namespace Mount, IPC, … Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems From virtual machines to Linux containers Cgroups + namespaces provide (for most part) everything needed to create “containerized” processes. SELinux and a few other pieces for security Application Application Application LXC (Linux Containers) WebServer WebServer WebServer User-land tools that allows creation and Debian OS RHEL OS Ubuntu OS running of multiple isolated Linux virtual Container1 Container2 Container3 environments (VE) on a single host Apart from linux kernel, everything can be Linux Kernel isolated--userland, libraries, runtimes, and applications Hardware Ex: can be used to run multiple LINUX distros as containers Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems From LXC to Docker Early LXC was built for sysadmins No support for moving images, copy-on-write, sharing previously created images But developers have different needs Application developed on dev servers, tested in build servers, and deployed in across production servers. How do we make sure configuration is the same? Dependencies are met? Docker was developed by Solomon Hykes and others at dotCloud in 2013 to solve this problem with containers Package system that can pack an application and all dependencies as a container image after development Transport system that ensures that the application image runs exactly similar on test and production systems Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems More on Docker Docker was originally built on LXC and provided A container image format A method for building container images (Dockerfile/docker build) A way to manage container images (docker images, docker rm , etc.) A way to manage instances of containers (docker ps, docker rm , etc.) A way to share container images (docker push/pull) A way to run containers (docker run) Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Container ecosystem today Low-level Container runtimes Set up and manage namespaces, cgroups and container execution Ex: runC (open container initiative), lxc, rkt High-level container runtimes Image format, image management, sharing Ex: cri-o, containerd from Docker (builds on runc) Docker today is a collection of components Docker engine: user-facing daemon, REST API, CLI “Runtime”: Containerd, container-shim, runC Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Advantages of containers Application WebServer Application WebServer Application WebServer Abstraction levels Debian OS Windows Ubuntu OS Hypervisors work at hardware abstraction level OS Containers work at OS abstraction level Linux kernel Windows kernel Linux kernel Containers offer higher density VM1 VM2 VM3 VMs need O(GB) vs containers that need O(MB) Can pack many more containers per server VMM Containers improve elasticity Hardware Easy to “scale up” container than a VM Reason for container adoption in hosting and PaaS environments Example: Everything in Google from gmail to search is Application Application Application containerized WebServer WebServer WebServer Native CPU performance No virtualization overhead Debian OS RHEL OS Ubuntu OS Dramatically improves software development lifecycle Container1 Container2 Container3 Easy to build, test, deploy software without worrying about portability Linux Kernel Hardware Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems From Containers to Serverless Functions Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Container Case Study Google Borg Internal container platform at Google (http://static.googleusercontent.com/media/research.google.com/en//pu bs/archive/43438.pdf) 25 second median startup! 80% of time spent on package installation What if we have an light-weight HTTP app? Say < 1,000 LoC that runs for 200 msecs on each HTTP request? In a container, the app would take too long to start Containers cannot deal with flash crowds, load balancing, interactive development, etc Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Three generations of virtualization Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Serverless functions: Model Run user handlers in response to events web requests (RPC handlers) database updates (triggers) scheduled events (cron jobs) Pay per function invocation actually pay-as-you-go no charge for idle time between calls e.g., charge actual_time * memory_cap Share server pool between customers Any worker can execute any handler No spinup time Less switching Encourage specific runtime (C#, Node.JS, Python) Minimize network copying Code will be in resident in memory Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Architecture Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Architecture Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Architecture Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Architecture Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Architecture Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Functions vs containers Serverless Computation with OpenLambda, Scott Hendrickson, Stephen Sturdevant, Tyler Harter, Venkateshwaran Venkataramani†, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci- Dusseau Experimental setup: Amazon Elastic Beanstalk Autoscaling cloud service Build applications as containerized servers, service RPCs Rules dictate when to start/stop (various factors) AWS Lambda serverless functions Workload Simulate a small short burst Maintain 100 concurrent requests Use 200 ms of compute per request Run for 1 minute Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Scalability result AWS Lambda RPC has a median response time of only 1.6s Lambda was able to start 100 unique worker instances within 1.6s An RPC in Elastic BS often takes 20s. All Elastic BS requests were served by the same instance; as a result, each request had to wait behind 99 other 200ms requests. Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems Functions vs explicit provisioning With VMs or containers, we need to decide What type of instances? How many to spin up? What base image? What price spot? And then wait to start….. Functions truly delivery the promise of the cloud finally pay-as-you-go finally elastic fundamentally changes how people build scalable applications Raja Appuswamy (Eurecom) Cloud Computing and Distributed Systems