ilovepdf_merged.pdf
Document Details
Uploaded by AdventuresomePrairie641
Tags
Full Transcript
Virtualization - Definition Virtualization is a computer architecture technology by which multiple virtual machines are multiplexed in the same hardware. It is an art of creating virtual resources such as server, storage, network, and so forth in a layer abstracted from the real hardwar...
Virtualization - Definition Virtualization is a computer architecture technology by which multiple virtual machines are multiplexed in the same hardware. It is an art of creating virtual resources such as server, storage, network, and so forth in a layer abstracted from the real hardware. Virtualization is a old concept -- the idea of VM is from 1960s (IBM S/370). It is applied in Clouds! They used the concept of control program (similar to VMMs). www.sbenedictglobal.com Objectives of Virtualization To enhance the resource sharing by many users at a time. To replace and upgrade hardware on the fly (a sort of isolation among guests). To add new devices, such as, network cards, virtual sockets, processors,…, without reboots. VMs help to reduce the down time. Virtualization techniques offer administrative tasks such as installing software, planning new VMs, optimizing number of VMs, and so forth at runtime, Faster provisioning of multiple machines The objectives of virtualization www.sbenedictglobal.com Basics – Modes of Operation Generally, the Operating System operates in two modes Kernel mode and User Mode Kernel mode In this mode, OS allows all CPU instructions to execute on the underlying hardware. Kernel codes do not execute in the USER mode. User mode OS allows only a few instructions to be executed (for eg. The instructions that process data -- User applications) If the user applications have to execute the privileged instructions, the applications ask kernels to do the work (for eg. Via. a system call). User applications can't open files, send network packets, print to the screen, or allocate memory. NOTE: Kernel processes run in the KERNEL mode with the supervisor privilege or superuser privilege User processes run in the USER mode with the user privilege. OS manages processes and threads. System Calls and CPUs OS does Process management (start, run, stop processes) Memory management (Allocate, Deallocate memory) File Management (Open, Close, Modify, Read, Rename, Create) Network Management, Scheduling, Timing, etc. At user mode, the user applications initiate a system call to get OS related services. A system call is a user space request of a kernel service. There are hundreds of system calls in a 64bit OS. Eg. sys_enter and sys_exit the x86_64 provides >322 system calls and the x86 provides >358 different system calls. Typically, a system call takes around 242 CPU cycles. www.sbenedictglobal.com Bare-Metal Virtualization (Type-I) Physical Virtual (Typical) Ring 3 Applications Guest Ring 3 Applications Ring 1 Guest OSs Ring 1 Ring 0 OS VMM Ring 0 Hardware Hardware As multiple OSs should function in a single machine, here hypervisor acts at Ring 0. Guest OSs operate at Ring 1. Applications operate at Ring 3. This type of hypervisor is named as “Bare-metal Hypervisor”. It is usually harder to setup (probably, using some console manager software). Eg. VMware ESX/ESXi, Xen, Oracle VM server www.sbenedictglobal.com Hosted Virtualization (Type-II) Here, the hypervisor is loaded on top of an OS. It is easy to setup. The Guest OS runs on the hosted hypervisors. Virtualization due to this technology is also named as Hosted virtualization Type II Eg. VMware workstation Guest Guest Normal Eg. VMware fusion, Applications OS OS VMM Eg. Oracle Virtualbox, Eg. Parallels on Mac. Host OS Hardware www.sbenedictglobal.com Full Virtualization Full Virtualization Here, the guest OS is not modified. Guest OS works in Ring 1; VMM works in Ring 0 What happens when privilege instructions are executed? Every privilege instructions are trapped (i.e., requires a s/w interrupt) due to the execution in the less privileged ring. The VMM intercepts such traps and emulates the instruction on the fly (IBM S/670). Challenge!!! This approach was not successful in Intel x86 machines -- over 17 privileged instructions were not trapped by VMM. A failure! In the meantime, VMware was successful to implement a binary translator (i.e., x86 to x86 translator which overrides such privileged instructions). It translates the binary code and places them in the translation cache. www.sbenedictglobal.com Impact of Memory virtualization In VMs, we have hypervisors!!! If so, how mapping happens? Here, programs’ memory addresses (virtual addresses of VMs) are mapped to Virtual Physical Memory and then to physical memory. It is a 2-stage mapping process for any guest OS. Thus, guest OS cannot directly access the machine memory. VMM does the mapping of addresses. The page table in VMM is called a shadow page table. The shadow paging process takes 3 to 400 times more cycles than native situations. Control Program’s Virtual of Guest memory MAPPED Physical OS addresses Memory Dynamically allocated Virtual Machine Control Physical MAPPED of Memory VMM Memory Dynamically allocated www.sbenedictglobal.com Para Virtualization Here, guest OS needs to be modified at the source code level – i.e., no need for trapping and binary rewriting… Runtime changes are avoided. It avoids on-the-fly modifications due to code transformation (eliminates traps) Hypervisor provides interfaces to accommodate critical kernel operations such as memory management and interrupt handling. Performance is comparatively good. Because, paravirtualization avoids unnecessary trapping of critical instructions. Thus, the advantage of paravirtualization is to have a lower virtualization overhead. But, the challenge is that you need a modified guest OS!!! KVM is a para-virtualization tool. www.sbenedictglobal.com Hardware-Assisted virtualization A hardware-assisted virtualization support is also available for Intel and AMD. VT-x – named as Virtualization Technology Eg. Intel VT-x (formerly, Vanderpool Technology) Eg. Intel VT-i (Vanderpool Technology for itanium) Eg. AMD has AMD-v H/w virtualization support should be enabled in the BIOS setup. www.sbenedictglobal.com Purpose – Hardware Assisted Virtualization The purpose of hardware assisted virtualization is not to just add hardware for doing binary translation!! The main idea is to quickly identify the privilege instructions and to efficiently execute them. To do so, one more high priority layer was introduced at the hardware level. VMM works at this level and guest OS could operate at Ring 0 OS-Level Light-weight virtual machines – Containers Containers share the underlying OS kernel and can thus run only flavors of the same OS Building blocks of OS-level virtualization Namespaces and cgroups are two building blocks of OS-level virtualization Linux namespaces – a feature of linux kernel Namespaces are used in linux to limit the views. The namespaces wrap a group of resources. Several kinds of Namespaces exist… PID namespace allows us to create another set of PID starting from PID 1 for that specific namespace. If not, init process gets PID 1. Without namespaces, all processes descend from the init process i.e., PID 1. Processes within the new namespace cannot view the parent process. But, the parent processes could view the child processes! www.sbenedictglobal.com Memory cgroup Generally, it is a memory resource controller Isolates the memory behavior of a group of tasks from the rest of the system. It creates a cgroup with a limited amount of memory. It separates the memory-hungry applications from the other applications. A cgroup with memory controller is called as “memory cgroup” (lwn.net) Features of a memory cgroup Accounting How much memory pages are utilized by a specific group of running processes? in file pages (pages on disks), in anonymous pages (pages are not located on disks. They are from heaps, stacks..), Limiting Soft limit – memory is allotted if available. Hard limit – memory is not allotted to the group of tasks Containers – A global Perspective Containers use the kernel features (cgroups and namespaces) Thus, container technology provides an environment where the hardware is shared among multiple users. It is a lightweight VM; less space; and can get a shell on it via. SSH. Anywhere, any app, any language Container 1 Container n Shared kernel of OS OS Machine Container solutions allow multiple isolated Linux systems of same kind on a single host. Thus, it is called as a Self- Contained Execution Environment. Containers cannot boot different Operating Systems and it cannot have its own kernel modules (diff. from Type-I VMs). Eg. LXC, OpenVZ, BSD Jails www.sbenedictglobal.com Docker Docker utilizes the container technology (cgroups and namespaces) It easily ports containers It replicates containers across environments. Thus, it reduces the time between writing code and producing them. It removes unnecessary configurational hurdles of applications. Since 2014… Docker vs. Virtual Machines Problems with VM – size, memory, integration It utilizes union filesystem – Unionfs is a filesystem service for Linux, FreeBSD and NetBSD which implements a union mount for other file systems. It allows files and directories of separate file systems, known as branches, to be transparently overlaid, forming a single coherent file system. (wiki…) Docker Containers -- Operations Docker caches the layers the first time of building them. Process 0 Process 1 Process N For eg. … Golang || Apache Docker Engine For the initial install, Ubuntu is cached or Operating System golang; For the second build, only apache or mysql is built rather than initiating the Machine or VM build process from Ubuntu base!!! Thus, the deployment is faster using dockers. www.sbenedictglobal.com Docker Architecture It follows the client-server architecture. Docker client talks to the docker daemon. The docker daemon does the following: Building, running, distributing docker containers. The docker registry stores docker images Either in local registry or in public registries (such as, docker hub or docker cloud) Dockers are based on images (a collection of layers) https://docs.docker.com/get-started/overview/ Docker Container States Created Dead Restarting Exited Running Paused Docker Tools A bundled package which contains all components of dockers – Docker docker engine, cli, credential helper, and so forth. desktop It is a tool for defining instances specific to certain applications. It is able to build and run the multi-container docker applications. Docker It is represented as YAML files. compose It is a tool to manage docker containers hosted on clusters. It has features such as scaling, multi-host orchestration, service Docker discovery, load balancing, and so forth. swarm It follows init and join approach (as like in kubernetes). VMWare – vSphere Product What is it ? It virtualizes and aggregates the underlying physical hardware resources of a datacenter. It promotes private cloud (or hybrid) rather than a public cloud concept!!! (DIFFERENT from AMAZON or similar clouds) VMWare extended their previous virtualization products, such as, Vmware ESX, workstation, and so forth vSphere is called as a cloud OS – aggregates the infrastructure of datacenter. vSphere is a Type-I virtualization. It enables us to manage IT resources. VMWare Topology With this architecture, we can create a virtual datacenter. vCenter server Provides a central point of control to the datacenter. It is responsible for performance monitoring, management, and configuration. Server groups with computing servers They run ESX or ESXi based x86 servers on bare metal. These servers could be grouped as clusters too. Storage networks/arrays 3 types Fibre channel SAN Internet Small Computer Storage Interface (iSCSI) NAS www.sbenedictglobal.com VMWare vMotion It enables the migration of virtual machines from one host memory to another host memory without service interruption (with no downtime). But, the disk and the other files are stored in the shared memory. This allows administrators To off-load virtual machines from one storage array to another to perform maintenance, To resolve out-of-space issues, and To upgrade VMFS. https://www.vmware.com/products/vsphere/ vmotion.html www.sbenedictglobal.com Storage vMotion (SVMotion) Does the same as vMotion. But, it moves storage information in addition to moving guests. Ie. Snapshots could also be moved. This enables datacenter level VM migration. Note: Vmotion cannot do a datacenter level VM migration. www.sbenedictglobal.com VMWare - Insights Dynamic Resource Scheduler It schedules the resources based on loads or capacity requirements. It has features to do automatic scheduling of cluster configurations. Consolidated Backup It is utilized for the backup of ESX servers. www.sbenedictglobal.com Infrastructure as a Service - AWS Michael Gerndt Technische Universität München 1 AWS – Resource Distribution 4 AWS - Regions Geographic cluster of availability zones Currently 33 regions Account has one or more available regions AWS GovCloud (US) accounts limited to: AWS GovCloud (US-East and US- West) regions AWS (China) accounts only: AWS (China) Bejing and Ningxia User can control where resources are allocated Meet legal requirements such as in Europe Have short latency access for customers Regions are isolated for fault tolerance and stability. You see only your VMs in the current region. Communication among regions is not free. 5 AWS – Availability Zones Availability Zone: Think about a data center 33 regions and 105 availability zones Two availability zones have no common points of failure, thus servers in two zones gain infrastructural redundancy. Naming region code + letter, us-east-1a Mapping of names to zones might be di erent for di erent accounts for load balancing. User can control the zone in which a VM is started for fault tolerance reasons, otherwise AWS will select a zone. Number of zones in a region might be di erent for accounts. Communication in a zone is free, between zones it has to be paid. Data centers of Amazon are connect via the AWS backbone network 6 ff ff ff Infrastructure for the Edge AWS provides infrastructure near to the clients Reduced latency and bandwidth AWS Local Zones Zones o ering a limited set of services with single digit ms latency network access. Deploy your application with the standards APIs. AWS Wavelength Zones Compute, storage, and networking services within 5G networks Ultra-low-latency (single digit ms latency) mobile edge computing Key services like Amazon EC2 instances, EBS (Elastic Block Store), VPC (Virtual Private Cloud), and IAM (Identity and Access Management) are available within Wavelength Zones. Deployment into Wavelength Zones using the same AWS Management Console and APIs they use for other AWS services. AWS OutPost Zones On-premise infrastructure integrated into AWS Cloud and managed as Cloud resources with AWS API and tools 8 ff Amazon CloudFront Cloud-based content distribution network Allows you to place your online content on a global network of edge locations Content will be delivered from a location close to the requestor. To use Amazon CloudFront, you: Store the original versions of your les in an Amazon S3 bucket. Create a distribution to register that bucket with Amazon CloudFront through a simple API call. Use your distribution’s domain name in your web pages or application. Pay only for the data transfer and requests that you actually use. Protection from DDoS attacks by AWS Shield 9 fi CloudFront Content Delivery Service and the Edge AWS Edge Location Only used for AWS managed services (CDN, rewall and DDoS protection, routing to the AWS backbone network). AWS data centers keeping cached copies closest to the end user. Point of Presence (POP) serving content directly to your viewers. AWS Regional Edge Caches AWS CloudFront (the low-latency content delivery network (CDN) service) between your origin server and related POPs 600+ edge locations and 13 regional edge caches 10 fi Amazon Elastic Compute Cloud - EC2 11 Amazon Elastic Compute Cloud Provides Virtual machines running inside the Amazon Cloud. Instance storage tied to the hosting server Network accessible block storage that persists across time and can be mounted in the VM. Virtual Private Cloud (VPC) to secure your network in the Cloud Based on Xen hypervisor AWS announced end of 2017 to switch to an own hypervisor based on KVM for new highend Intel processors. 12 Xen Hypervisor Three levels of virtualization Bare metal: hypervisor sits in between the hardware and the host OS/VMs Hosted virtualization: hypervisor runs on top of the host operating system OS-level virtualization: containers running on top of the OS kernel. Xen is a bare metal hypervisor One VM is called Domain 0 (DOM0) and runs the host OS. It starts rst and runs the Xen management software, manages other VMs, has drivers for hardware and provides virtual disks and network access to unprivileged VMs. 13 fi Nitro Hypervisor Special interface cards Network, interrupt handling and block storage Management happens in hardware instead of software in DOM0 O er limiters to guarantee resource distribution, e.g., network bandwidth Hardware-based security support for Nitro Enclaves to protect sensitive data Hardware is faster Entire Dom0 can be removed No cores reserved for Dom0 15 ff Amazon Machine Image Amazon Machine Image (AMI) also called VM template Copy of a server with OS and preinstalled software Prede ned AMIs from Amazon and third-parties, user-de ned AMIs possible AMIs are stored in S3 Di cult to select an AMI, they could even include Trojans or backdoors. Amazon provides reviews and ratings http://aws.amazon.com/amis. 16 ffi fi fi AWS Storage Amazon Elastic Block Storage Amazon EC2 Instance Storage Amazon Elastic File System (Amazon EFS) Amazon Simple Storage Service (Amazon S3) 18 Amazon Block Storage Block storage volume Block-level storage which can be mounted It can be formatted as appropriate Multiple can be combined into a virtual RAID Snapshots of block storage volume are stored in S3 for backup or replication 19 Amazon Instance Storage Disks attached to the physical host If you stop or terminate an instance, any data on instance store volumes is lost. Some instance types use NVMe or SATA-based solid state drives (SSD) to deliver high random I/O performance. 20 Amazon Elastic File System Scalable le storage Can be created and mounted into instances. Files can be shared among instances. File system has to be explicitly created and destroyed. 21 fi Amazon Simple Storage Services (S3) Reliable and inexpensive data storage infrastructure Supports objects from 1 byte to 5 TB Two-level namespace Buckets: at collection of buckets, namespace is shared across all Amazon customers Objects: File in the buckets Slow compared to local discs or EBS Access In EC2 From the web High durability but low availability Most users use S3 for short-term or long-term backup 22 fl Amazon EC2 Instance Instance Running VM which is based on an AMI Instance type VM with di erent compute and memory capabilities. Storage Boot device volume Elastic Block Storage Instance Storage Instance store volumes: local discs of the server Both are lost, when the instance is terminated. For persistency use EFS, EBS List of instance types https://aws.amazon.com/ec2/instance-types M7(g/i/a): AWS Graviton, Intel, AMD CPUs 23 ff Amazon EC2 Instance Elastic IP address Static IP address is required if you want to use an instance that must always be accessible by the same IP address You pay for address independent of the usage. Account limit of number of VMs of a certain type 24 AWS Instance Lifecycle http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-lifecycle.html 25 AWS EC2 Instance Characteristic Amazon EBS-Backed Amazon Instance Store-Backed Boot time for an Usually less than 1 minute Usually less than 5 minutes instance Root device Amazon EBS volume Instance store volume volume Data persistence By default, the root volume is deleted when the Data on any instance store instance terminates.* Data on any other Amazon EBS volumes persists only during the volumes persists after instance termination by default. life of the instance. Modi cations The instance type, kernel, RAM disk, and user data Instance attributes are xed for can be changed while the instance is stopped. the life of an instance. Charges You're charged for instance usage, Amazon EBS You're charged for instance volume usage, and storing your AMI as an Amazon usage and storing your AMI in EBS snapshot. Amazon S3. AMI creation/ Uses a single command/call Requires installation and use of bundling AMI tools Stopped state Can be placed in stopped state where instance is not Cannot be in stopped state; running, but the root volume is persisted in Amazon instances are running or EBS terminated 26 fi fi Lifecycle Instance state Description Instance usage billing pending The instance is preparing to enter the running Not billed state. An instance enters the pending state when it launches for the rst time, or when it is restarted after being in the stopped state. running The instance is running and ready for use. Billed stopping The instance is preparing to be stopped or stop- Not billed if preparing to stop hibernated. Billed if preparing to hibernate stopped The instance is shut down and cannot be used. Not billed The instance can be restarted at any time. shutting- The instance is preparing to be terminated. Not billed down terminated The instance has been permanently deleted and Not billed cannot be restarted. 27 fi Characteristic Reboot Stop/start (Amazon EBS- Hibernate (Amazon Terminate backed instances only) EBS-backed instances only) Host computer The instance In most cases, we move the Same None stays on the instance to a new host same host computer. Your instance may computer stay on the same host computer if there are no problems with the host Private and These computer. The instance keeps its Same None public IPv4 addresses stay private IPv4 address. The addresses the same instance gets a new public IPv4 address, unless it has an Elastic IP address, which doesn't change during a stop/start. Elastic IP The Elastic IP The Elastic IP address Same The Elastic IP addresses address remains remains associated with the address is (IPv4) associated with instance disassociated from the instance the instance IPv6 address The address The instance keeps its IPv6 Same None stays the same address 28 Characteristic Reboot Stop/start (Amazon Hibernate (Amazon EBS-backed Terminate EBS-backed instances instances only) only) Instance store The data is The data is erased Same The data is volumes preserved erased Root device The volume The volume is preserved Same The volume volume is preserved is deleted by default RAM (contents The RAM is The RAM is erased The RAM is saved to a le on the The RAM is of memory) erased root volume erased Billing The instance You stop incurring You incur charges while the You stop billing hour charges for an instance instance is in the stopping state, incurring doesn't as soon as its state but stop incurring charges when the charges for change. changes to stopping. instance is in the stopped state. an instance Each time an instance Each time... as soon as transitions from stopped its state to running, we start a changes to new instance billing shutting- period, billing a minimum down. of one minute every time you restart your instance. 29 fi Instance Placement Groups Cluster placement group Logical grouping of instances Instances are packed closely in an availability zone to increase network performance. Partition placement group Spread instances across partitions such that di erent partitions do not share the underlying hardware. Each partition gets its own rack. Partitions can be placed in di erent availability zones. Reduce likelihood of correlated hardware failures and improve performance in a partition. Spread placement group Spreads instances across distinct underlying hardware. Reduce correlated hardware failures. 30 ff ff Security Accounts have their own Virtual Private Cloud Your resources are launched into your VPC VPC resembles your network in your own data center Con guration IP address range, create subnets, and con gure route tables, network gateways, and security settings Connect instances to the internet Connect your VPC to your data center Amazon created a default VPC but additional VPC can be created by the user. 31 fi fi EC2 access Primary means is through a web services API Interactive tools on top of the API Amazon Web Services Console Amazon Command Line tools Access to your server is by private/public key pair 32 AWS CloudFormation Model your infrastructure Infrastructure as Code Specify all resources in a textual way as a json template Allows to standardize components across your institution. Automatic deployment of all resources, controlled and predictable Use code editor and versioning tools 34 Terraform Hashicorp Multicloud infrastructure management (Graphical) speci cation of infrastructure Translation into a textual speci cation (TF con g) API checking of the deployed infrastructue Determine and execute a plan to go from state to the desired infrastructure Terraform Enterprise Collaboration in a team Con guration is at a central server 36 fi fi fi fi Third Party Cloud Management Platforms Cloud Management platforms Management of a whole infrastructure with multiple servers, accounts, reports etc. Multi-cloud management, automation and orchestration, cost optimization, security and compliance, performance monitoring and resource optimization Examples exera.com, Scalr, Morpheus Data, IBM Cloud Manager for Kubernetes clusters 37 fl Pricing On-demand pricing https://aws.amazon.com/ec2/pricing/on-demand/ Reserved instances pricing https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/ Pricing Spot-Market https://aws.amazon.com/ec2/spot/pricing/ 38 Pricing for Data Transfer Internet IN: free OUT: < $0.09 per GB Inside Availability Zone (private IP address) None Regional Transfer (private IP address) Between di erent availability zones in same region $0.01 per GB in/out Public and Elastic IP address inside EC2 $0.01 per GB in/out 39 ff Pricing Block Storage and Elastic IP Addresses Block Storage $0.08 per GB-month of provisioned storage on SSD $0.045 per GB-month of provisioned storage on HDD $0.0005 per provisioned IOPS-month (some SSD storage) Object store S3 $0.023 per GB-month of data stored $0.005 per 1,000 PUT requests $0.0004 per 1,000 GET requests Public IP Addresses $0.005 per complete hour 40 Amazon EC2 Infrastructure as a service AWS o ers also platform as a service, Lambda, IoT,... Flexible instance types Large variety of Amazon Machine Instances Pricing: On-Demand, reserved, spot market pricing 41 ff Base Technologies for Accessing the Cloud Mobile Processors Wi DSL fi Processors for mobile devices ARM British company, 2016 acquired by SoftBank (Japan) Developing processor designs Integrated into SoCs for mobile devices ARM Big Little WLAN Wireless Local Area Network IEEE 802.11 standards and marketed as Wi-Fi Network consists of clients and access points acting as routers. Two modes Infrastructure: clients connect to access point Ad hoc: clients communicate among each other Securing the net Wi-Fi Protected Access (WPA, WPA2) DSL Digital Subscriber Line (DSL) Last mile connection Transmission of digital data over telephone lines Can share telephone service on same line due to di erent frequencies Performance Downstream between 256 Kbit/s and 100 Mbit/s Asymmetric DSL Upstream bandwidth much lower Pool of usable frequency channels is split among down and upstream ff VDSL with vectoring Vectoring is a technique to reduce crosstalk between di erent lines. Special encoding of neighbouring lines Similar to noise cancellation headphones Provider needs to have access to all lines in a bundle. Therefore this might lead to cancellations of contract to give a whole bundle to a single provider. ff Providers of IaaS Amazon Web Services MS Azure Google Cloud Alibaba Telecom providers, private companies OpenStack (Open Source) Openshift 52 Lightweight Sandboxes in Function-as-a- Service (FaaS) Cloud Computing SoSe 2024 15th May 2024 Mohak Chadha (M.Sc. Informatics) [email protected] Chair of Computer Architecture and Parallel Systems (CAPS) Technical University of Munich Germany Cloud Native Computing Foundation (CNCF) Mission: Making cloud native computing ubiquitous v Founded in 2015. v Part of the Linux Foundation. v Non-profit organization. v Open-source. Cloud Native? Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Projects: Cloud Native Computing Foundation (CNCF) Sandbox Incubating Graduated Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Introduction: Serverless Computing Virtualization stack abstraction Server-based Serverless Unit of scale: Functions Deploy in milliseconds/seconds Live for seconds PaaS Unit of scale: Containers Deploy in seconds Live for minutes/hours IaaS Unit of scale: VMs Deploy in minutes Live for week Bare Metal Unit of scale: Physical Servers Deploy in hours/days Live for years Focus on application/business logic Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Introduction: How Function-as-a-Service works? FaaS Compute Platform Function Free 2 No 1 Cold Start New Invocations Instance ? instance 2 Yes 4 Response 3 Execute Handler method Instance Execution Environment Runtime Function handler Cold Start Warm Start Start of a new Bootstrap the Function handler method Code Download instance runtime execution Munns, C. (2020, January 20). End Cold Starts in Your Serverless Apps with AWS Lambda Provisioned Concurrency. AWS Online Tech Talks. https://pages.awscloud.com/End-Cold-Starts- Platform Optimization in-Your-Serverless-Apps-with-AWS-Lambda-Provisioned-Concurrency_2020_0101-SRV_OD.html. Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Serverless Compute Platforms Public Cloud Offerings Open-Source AWS Lambda AWS Fargate Google Cloud Google Cloud Run Functions Azure Functions Scaleway Functions IBM Cloud Functions Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Example: Writing a FaaS function package com.example.helloworld; Write handler import static spark.Spark.*; 1 method public class HelloWorldApplication { public static void main(String args[]) { // Change the default port in SparkJava to env. variable port or 8080. 2 port(Integer.valueOf(System.getenv().getOrDefault("PORT", "8080"))); get("/", (req,res) -> "Hello world!"); Build Container } Image } Knative based on Apache Spark Deploying the 3 function Invoke the function + "Hello world!" 4 Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Overview: Kubernetes Architecture Control Plane Components Worker Node Components Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Introduction: Containers An isolated and restricted box for running processes namespaces cgroups, capabilities, seccomp 1 mnt (Mount) Memory, CPU cat /proc/{PiD}/status Filtering system calls 2 uts (Unix Time-sharing)... 3 ipc (Interprocess Communication) CapPrm: 00000000aa0435fb CapEff: 00000000aa0435fb 4 pid (Process ID)... 5 net (Network) 6 cgroup (Cgroup) Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Introduction: Open Container Initiative (OCI) v Established in 2015. 1 Runtime Specification (runtime-spec) 2 Image Specification (image-spec) 3 Distribution Specification (distribution-spec) OCI-Compliant Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 What happens when you do docker run? docker dockerd containerd Containerd-shim runc container (crun) Container Runtime kubelet Interface (CRI) gRPC based API Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Introduction: Containerd q Industry-standard container manager. q Available as a daemon on Linux. q Manages complete container lifecycle of its host system, e.g., image transfer and storage, container execution and supervision. q OCI-compliant. q Supports snapshotting. q Supports abstraction of lower-level container runtimes through containerd-shims. Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Introduction: Containerd-shim q Piece of software that resides between containerd and a low-level container runtime (runc, crun). syntax = "proto3"; q Abstract low-level runtimes. package containerd.task.v2; service Task { q Lives as long as the container process. rpc State(StateRequest) returns (StateResponse); rpc Create(CreateTaskRequest) returns (CreateTaskResponse); q In contrast, OCI runtimes just start a rpc Start(StartRequest) returns (StartResponse); rpc Resume(google.protobuf.Empty) returns (google.protobuf.Empty); fork/exec container process and then exits. rpc Checkpoint(CheckpointTaskRequest) returns (google.protobuf.Empty); rpc Kill(KillRequest) returns (google.protobuf.Empty); q Intercepts container’s stdin, stdout and rpc Exec(ExecProcessRequest) returns (google.protobuf.Empty); stderr streams and redirects them to logs. … q Keeps track of container exit code. gRPC/ttRPC based API Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Introduction: runc q Command-line tool q Implements the OCI-runtime specification. q Prepares the runtime for the container, including the creation of namespaces. q Starts the container process and then exits (detached mode). q Support for rootless containers More Info: q Input: Filesystem bundle, containing a mandatory config.json q Config file contains necessary data to implement standard container operations Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Containers vs Virtual Machines Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 The need for more isolation q Linux kernel has 27.8 million lines of code q 319 native 64-bit syscalls in Linux x86_64 q Prevention of kernel bugs from untrusted userspace code Types of Exploits: q System API Interface for interaction with the Host Kernel or Hypervisor with system calls and traps. Bugs within the kernel/hypervisor can be exploited via the API. Example: Dirty COW (CVE-2016-5195) More Info: q System ABI Hardware and software exploits targeting execution path in response to events Example: glibc (CVE-2017-1000366) q Side Channels Exploiting indirect effects of the system or hardware Example: Spectre and Meltdown (CVE-2017-5754,CVE-2017-5753, CVE-2017-5715) Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 The need for more isolation q Linux kernel has 27.8 million lines of code q 319 native 64-bit syscalls in Linux x86_64 q Prevention of kernel bugs from untrusted userspace code Types of Exploits: q System API Minimize the Attack Interface for interaction with the Host Kernel or Hypervisor with system calls and traps. Bugs within the kernel/hypervisor can be exploited via the API. Vectors Example: Dirty COW (CVE-2016-5195) More Info: q System ABI Hardware and software exploits targeting execution path in response to events Example: glibc (CVE-2017-1000366) q Side Channels Exploiting indirect effects of the system or hardware Example: Spectre and Meltdown (CVE-2017-5754,CVE-2017-5753, CVE-2017-5715) Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 gVisor q Open-source, secure-container runtime, developed by Google. q Written in Go. q Used in production in Google Cloud Functions and Google Cloud Run. q Minimize the system API attack vector. q Sandboxed Host System API. q Independent user-space application kernel. Relies on SECCOMP_RET_TRAP feature to intercept system calls. Transfers control to gVisor for handling system calls. q Support for docker/containerd. Performs better with nested virtualization. q Two platforms: systrap and KVM. Uses KVM to allow sentry to act as both guest OS and VMM. q Support for amd64, arm64 Leverages virtualization extensions to improve performance. q Support for Nvidia GPUs with nvproxy Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 gVisor Architecture q Sentry intercepts the syscalls made by the application q Only few syscalls are made by the sentry to the host linux kernel q Seccomp profile for filtering allowed syscalls by the sentry q Access to filesystem via a separate process called gofer. Drawbacks: Not well suited for syscall heavy workloads Not all syscalls are implemented Linux Sandbox Fliesystem Protocol Uses containerd-shim-v1 API Reduces #operations for filesystem calls, reduces lock 15-20% average overhead compared to runc contention, memory efficient Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 AWS Firecracker q VMM that uses KVM to create and manage microVMs. q Specifically designed for serverless computing. q Implemented in Rust. q Minimalist design. q Enhanced security and workload isolation over traditional VMs. q Reduced startup time and memory footprint. q Open-source. q Integration with container ecosystem: firecracker-containerd kata containers Weaveworks flintlock q Used in AWS Lambda and AWS Fargate. Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Firecracker Architecture 2nd layer of protection against unwanted VMM behaviour Restrictive seccomp profile enabling 24 system calls Start/Stop microVMs Configure vCPUs/memory for microVMs Jailer Rate limiters to granulary control network and storage resources used by Guest OS and Container Workload microVMs. IOPS for disk, packets per second for network, bandwidth for devices Share information between the Guest and the Host OS. Networks Storage RESTful Metadata Client API Service Rate Limiting Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Kata Containers q OpenStack Foundation Project. q Secure and isolated containers with a seperate kernel. q OCI compliant. Kata Agent: Implemented in Rust. One per VM. Managing user containers and their workloads. Cloud Hypervisor: Lightweight VMM based on the Rust VMM project. Minimal design. Implements containerd-shim-v2: One shim daemon per pod. q Supports pulling container images directly in the VM. Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 WebAssembly q Binary format, with alternative human-readable Source Source Compiler text representation LLVM IR LLVM IR q Virtual ISA x86 ISA WASM ISA q Linear 32-bit memory space LLVM IR Embedder q Isolated from host by default q Import/export system for granting capabilities x86 ISA Native (left) vs. WASM (right) code generation flow for LLVM-based compiler and WASM embedder Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 WebAssembly Module q Application code as WASM ISA q Defines functions, globals, memories, imports, exports, static data WASM Module q Instantiated by an embedder WASM Embedder glibc OpenMPI … q Portable Linux Kernel q WASM equivalent to ELF / Mach-O / PE Software Layers for WASM Execution Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 WebAssembly Embedder q Parses WASM modules and executes the application code q Execution strategies: Interpreter (Wasm3) WASM Module JIT AOT q Provides implementations for module imports WASM Embedder q Manages module memory space glibc OpenMPI … q Not portable q Links against native libraries Linux Kernel q System Interactions through WASI Software Layers for WASM Execution q Isolation: Software Fault Isolation Control Flow Integrity Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 WASI: Wasm System Interface Standardized non-Web system-oriented API for Wasm Capability-oriented Portable Custom libc implementation integrated into WASI-SDK Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Configuring Containerd Config file: /etc/containerd/config.toml version = 2 [plugins."io.containerd.runtime.v1.linux"] runc shim_debug = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] runtime_type = "io.containerd.runc.v2” [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc] runtime_type = "io.containerd.runsc.v1” gVisor [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata] runtime_type = "io.containerd.kata.v2" kata Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Runtime Classes In Kubernetes q Feature for selecting container runtime configuration. q Container runtime configuration is used to run a Pod’s containers. q With the help of containerd out-of-the-box support for Gvisor, Firecracker, Kata containers… Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Knative q Platform which deploys on top of K8s. q Three main components q Build provides k8s-style resources for declaring CI/CD-style pipelines. (deprecated) q K8s + Knative Serving -> Serverless. q Knative Eventing enables event-driven architecture for applications, e.g., producer/consumer Mohak Chadha | Lightweight Sandboxes in Function-as-a-Service (FaaS) | Cloud Computing SoSe 24 Security in AWS Tobias Dümmling & Lisa Martiny - Netlight Consultants Munich, 22.05.2024 Shared Responsibility Model § AWS is responsible for protecting the infrastructure that runs the services offered in the AWS Cloud. This infrastructure is composed of the hardware, software, networking, and facilities that run AWS Cloud services. § The customer is responsible for everything they decide to do with the services offered by AWS. § Similar services from AWS take more responsibility away from you (e.g., Serverless vs. VMs, S3 vs. EFS) § Consider total cost of ownership when looking for solutions Security Certifications & Attestations § Independent assessment & attestation to a HIPAA/HITECH – Health Insurance Portability and provider‘s capacity to serve industry specific Accountability Act of 1996 and the Health Information workloads Technology for Economic and Clinical Health of 2009 – § A certification is issued by an accredited Certification for handling medical data specialized company and often lasts between ISO/IEC 27001:2022 – A security management standard one and three years. A certification is a specifying best practices with regards to security snapshot in time. controls and Information Security Management Systems § An attestation focuses on the continuous FIPS 140-2 Level 3 – Federal Information Processing implementation aspect (eg. reaudits every 6 Standard 140-2 – U.S government security standard for months) and attests evidence of approving cryptographic modules with different level. appropriateness and effectiveness over a past Cloud Computing Compliance Controls Catalog (C5) – range of time. German Government Backed attestation scheme by BSI Each service meets different compliance criteria -> AWS Secrets Manager (ISO and HIPAA) vs. CloudHSM (FIPS 140-2 Level 3) Implications for us as cloud engineers Aws offers different services for the same tasks. You as cloud engineer have to ensure that the entire application adheres to the standard you must adhere. E.g., Secrets Manager and CloudHSM (FIPS 140-2 level 3) No free lunch, CloudHSM does not offer High-Availability, you must design for it yourself AWS Identity and Access Management (IAM) AWS IAM allows you to define who can Access access what in one central place It does this by providing, roles, users, … User EC2 groups, policies (and resource policies) A user can assume a role for temporary credentials It is best practice to manage access with Policy temporary credentials Aim for Principle of Least Privilege Group Permissions can be iterated on with AWS Policy Analyzer Assume Role Rolle Policy AWS Identity and Access Management (IAM) User pools can be federated with Access existing AD providers AWS CloudTrail allows you to monitor User EC2 who has made requests for resources in your account Access Identity Federation Logs Existing AD Solution Cloud Trail AWS Multi Account Setup Dev § Splitting workloads into multiple accounts provides security access and billing boundaries Script § There are many ways to split accounts. Common ways are environment or workload (A workload identifies a set of components that deliver User EC2 RDS business value together) § You can provide sandbox accounts for experimentation § Guardrails are governance rules which you can Script role out on your accounts and thereby enforce compliance. There are preventive and detective guardrails. EC2 RDS Preventive: Implemented with Service Control Policies Detective: AWS Config rules and AWS Prod Lambda Service Control Policy (SCP) AWS Organizations, AWS SSO and Control Tower § AWS Organizations allows you to consolidate multiple accounts into organizational units (OUs) and manage them centrally § IAM Identity Center allows you to centrally manage access to your AWS accounts with temporary credentials § AWS Control Tower combines AWS Organizations with various other AWS services to manage access, orchestration, and compliance of a multi-account setup according to best practices § Enable MFA for your users Secrets Management – Requirements ? DB_CONNECTION_STRING Access User EC2 RDS DB_CONNECTION_STRING Debug Database Secrets Management – Requirements ? § Centralized storage § Fine grained access control § Encryption at rest and in transit DB_CONNECTION_STRING § Secret rotation Access § Integration with other services User EC2 RDS DB_CONNECTION_STRING Debug Database Secrets Management Cloud Trail § Centralized storage Access Logs DB_CONNECTION_STRING § Fine grained access control with IAM § Encryption at rest and in transit § Secret rotation § Integration with other services Access User EC2 RDS Debug Database Secrets Management Cloud Trail Important - Establish a process for secrets Access Logs management DB_CONNECTION_STRING § Who is allowed to access/change which secrets? § How do you always ensure encryption? Access User EC2 RDS Debug Database OAuth2.0 Implicit Flow Implicit flow is no longer considered safe. There are other flows that extend the implicit flow to mitigate security risks (https://oauth.net/2/grant-types/) 6 4. Authentication server authenticates user against user pool 1 5. Authentication server redirects user to frontend with Access Token User WebApp1 Backend2 6. WebApp1 fetches image from Backend2 7. Backend2 verifies token with 2 Authentication/Authorization Server and 3 7 5 returns image 3 4 Login View Authentication/Authorization User Pool Server OAuth2.0 Authentication Code Flow 6 4. Authentication server authenticates user against user pool 5. Authentication server redirects user to 1 frontend with Authorization Code 6. Backend server of the frontend swaps the code for an access token User WebApp1 Backend2 7. WebApp1 fetches image from Backend2 6 2 8. Backend2 verifies token with 3 7 Authentication/Authorization Server and 5 returns image 3 4 Login View Authentication/Authorization User Pool Server Authentication Provider § Cognito is an authentication/authorization service that covers: Authentication Authorization Standards (OAuth 2.0, OICD, SAML) Integration within the AWS Ecosystem § There are other solutions (such as auth0). Making the call which solution works best is dependent on factors such as project requirements and maturity (checkout this link for an interesting blog post on the topic) Patch Management & Configuration Management § Patching is part of the lifecycle management of an application § Find and install patches (fixes) to software, drivers, firmware to protect against vulnerabilities § Configuration Management is responsible for ensuring that the configuration of managed systems conforms to security requirements § Both go hand in hand § A properly configured system is still vulnerable if it’s running on an outdated version § An up-to-date system is still vulnerable if it’s running with an overly loose configuration (or “passwd” as the admin password) § Configuration should be managed as code and deployed together with the application or the specific infrastructure. § Patching should be done on a regular basis. Our Advice: Automate it! Patch Management & Configuration Management § Services for Patch and Configuration Management provided by AWS: § Systems Manager § Define configurations for managed nodes once and reapply them automatically § Ensures conformity and compliance without operational overhead § Eliminates configuration drift § This should either be overwritten or incorporated into the centralized config management § Parameter Manager -> to store the configuration § Patch Manager -> to roll out patches at scale (e.g. installing Service Packs on Windows Servers, minor version upgrades on Linux nodes § More generally, keep the configuration close to the resources they apply to, ideally within the source code Integrating patch management into the development workflow § Automated patch management as part of the software development life cycle. § Tools such as Dependabot or Renovate scan a project‘s dependencies and provide PRs § They can be configured to auto-merge on non-breaking changes § Beware of blindly merging, ensure that you have tests in your pipeline in place to avoid breaking changes § Version increases can also break an application by either introducing a new security flaw or breaking the application Encrypted communication § Data in Transit refers to any data that is sent from A to B § Includes communication between users and applications but also communication between AWS S3 and application, backend and database § Use TLS § Ensure keys and certificates are managed securely in a PKI § Automatic renewal on a periodic basis § AWS Certificate Manager can be used to provision/manage certificates for AWS workloads § AWS Private Certificate Authority can be used to issue and sign those certificates § TLS usage should be enforced and considered a must-do § Communication between AWS services uses TLS by default § Use automatic routing from LoadBalancers to use TLS (route from port 80 to port 443) Data Protection – Encryption at Rest § Encryption at Rest means that you store data encrypted both at A and B RDS DynamoDB, ElasticCache, EBS, Redshift, … § Encryption at rest uses AWS KMS as central key management service KMS provides Integrations for most services (S3, EBS, RDS, DDB, Redshift, …) § Example S3 (SSE-S3, SSE-KMS, SSE-C, Client-Side Encryption) § Audit key usage with CloudTrail and implement automated anomality detection -> auditory, regulatory and compliance needs § Leverage IAM for fine grained permission control to KMS keys Data Protection – Data Lifecycle § Example Data Requirements: - EU Datenschutzverordnung requires “Datensparsamkeit” by default - Archive data for 10 years for compliance reasons - Archive data which is not required anymore to save cost - Delete data which is not needed anymore Data Protection – Data Lifecycle 30 Days (with tag short-data) § Example Data Requirements: 90 Days - EU Datenschutzverordnung requires Delete “Datensparsamkeit” by default - Archive data for 10 years for compliance reasons S3 Standard S3 IA S3 Glacier Deep Archive - Archive data which is not required anymore to save cost 30 Days (without tag short-data) 10 years - Delete data which is not needed anymore § AWS offers solutions such as S3 IA, S3 S3 Lifecycle Policy Glacier, Lifecycle policies, DDB TTL, Amazon Data Lifecycle Manager, … Distributed Denial of Service Attack (DDOS) § Distributed Denial of Service Attack § Attacks may be executed on different layers of the OSI model § Layer 7 attacks try to exhaust the target’s resources (HTTP Floods) § Layer 3 or 4 attacks try to exhaust the state of the resources or the network equipment (SYN Floods) § Volumetric attacks try to consume all bandwidth available (DNS Amplification) DDOS Mitigation Options § Attacks cannot be prevented, but you can Blackhole routing Rate Limiting improve your applications setup to deal with these attacks in a smart way. § Blackhole routing, rate limiting are effectively IP Blocking Overprovisioning reducing the availability of the service § Overprovisioning and unlimited scaling are costly Web Application Scalable § Building resilient applications with Firewalls, Firewall application design sensible scaling and content delivery networks in place is the recommended approach § With AWS WAF, AWS Shield and Shield Content Delivery Edge Location Networks Serving advanced, attacks can be detected early and mitigated Network Layer Security – VPCs and subnets § Apply Zero Trust, consider each application and component as discrete from each other, don’t trust anyone on the network § Use a multi-tier network setup with different layers § You use VPCs to model networks in AWS § A logically separated virtual network with specific IP address range (CIDR block) § VPCs can contain subnets, which further isolate the network § You define how VPCs and subnets connect to each other/the internet/local networks and how the routing works using Route Tables § Each route has a destination (the IP range to which the traffic should go) and a target (the gateway/network interface or connection) through which traffic passes. § Traffic to AWS services is routed through public network. Private connectivity can be set up using interface VPC endpoints (AWS PrivateLink) so that there is a network interface with a private IP inside the VPC through which to connect to a specific service. Network Layer Security – Recommendations § Have one subnet per availability zone for redundancy and fault resilience § Move applications without explicit need for internet access (eg DB clusters) into private subnets § Separate the applications into separate subnets for even more fine-grained segmentation What’s good about this setup? § Multiple (public and private) subnets, spread throughout Azs § Different private subnets for different application tiers § NAT Gateways to allow subnets to access internet, instead of allowing it all via Route Tables § VPC endpoints to ensure private traffic to AWS resources § Flow logs enabled to capture/monitor IP traffic Network Layer Security - NACL § Network Access Control Lists are a control layer on the subnet/VPC level § Stateless firewall with coarse-grained inbound and outbound rules § Supports “allow” and “deny” rules § Inbound rule with Number, Source, Protocol, Port: Which traffic can pass through TO the resource § Outbound rule with Number, Destination, Protocol, Port: Which traffic can pass out FROM the resource § Rules are evaluated in order, starting from lowest numbered rule § Only first rule that matches to the traffic is evaluated § Evaluated when traffic enters and leaves the network, but not applied on in-network traffic Network Layer Security – Security Groups § Security groups are an additional layer to provide fine-grained access to resources § Assigned to resources, not networks (e.g. EC2 instance, Lambda, RDS, …) § Responses from a resource that originated from allowed inbound traffic are allowed to leave, ignoring the outbound rule § Inbound and outbound rules which define which traffic is allowed § No ”Deny” rule action § No priority numbers, all rules are evaluated § Source/Destination may be an IP range or another security group § A security group without any rules is ”deny-all”, default outbound rule is however ”allow- all” § Stateful, responses to traffic that was allowed to the resource is always allowed to return, ignoring the outbound rules Subnets, NACLs and Security Groups combined Network Layer Security – how to verify § Use Reachability Analyzer to test if application is accessible or not § Use VPC Network Access Analyzer to see if any unintended access can occur § Identifies potential network paths § Use VPC Flow Logs to capture IP traffic § Can then be analyzed by other AWS Tools such as CloudWatch Logs § Set up additional inspection and protection § AWS Guard Duty for potential threat detection, it monitors VPC flow logs automatically for malicious activity, good to be used with automated alerts § AWS Network Firewall a managed network firewall and an intrusion detection and prevention service, can filter traffic at the perimeter (even before NACLs apply) Summary § Think of security on all levels (infra, network, application, users) and keep in mind that it is a continues process to keep your systems secure § In the end, it is important to combine the separation of network with your separation of accounts § There are many ways to design an architecture that solves a business problem. Your decisions impact the effort and cost you must sustain to keep the system secure. § Remember the total cost of ownership for your system. Maintenance can become unexpectedly expensive. § Prefer managed services if possible § Automate as much as possible § Ensure that the entire technology stack adheres to your compliance requirements Cloud Monitoring Michael Gerndt Technische Universität München 1 Why Monitoring? Make best use of your rented resources to reduce your costs and increase satisfaction of users of your services. 2 Monitoring De nition: Monitoring Monitoring in the cloud is the process of collecting status information of applications and resources. The data can be used to observe application and infrastructure. De nition: Monitoring System It consists of all components for gathering monitoring data at runtime. De nition: Monitoring Data All (raw) data captured by the monitoring system. 5 fi fi fi Information De nition Information is gained by processing, interpreting, organizing and visualizing raw data. It increases knowledge about the observed system. Example Raw data are CPU and memory utilization Information is that there is a trend for an overload or a memory leak. Required information is not always clear in the Cloud. Collect any data available Proactively creating information Continuous analysis for triggering alarms or to give an overview of the status of the system. Reactively Triggered through events such as an incident. E.g. root cause analysis and autoscaling Information is produced by Observability Frontends 6 fi Purposes Infrastructure level Resource management Incident detection Root cause analysis Accounting or metering for payment Intrusion detection Auditing Application level Performance analysis Resource management, e.g. scaling decisions Failure detection and resolution SLA veri cation Auditing 7 fi Three pillars of monitoring Metrics Logs Traces 8 Monitoring Metrics Metric: e.g. execution time Semantics Unit Context server, application service,… Representation Aggregation sum, min, max, mean, percentiles, histogram Measurement frequency every second, minute, 5 minutes 9 Important Metrics Latency The time it takes to service a request. Selectively measure successful and error request. Throughput or Tra c Web service: requests/second Streaming system: network I/O rate or concurrent sessions Database: transactions/second or retrievals per second Error rate Rate of requests that fail. Explicitly (HTTP 500), implicitly (wrong reply contents), or by violating an SLA Utilization or saturation Percentage of capacity CPU, memory, I/O 10 ffi Monitoring and Cloud Layers Context Metric Purpose Aggregation: no, min, max, mean, percentiles Client Request type #requests, latency, SLA check Requests Availability Alerting Application Service name #requests, request rate Autoscaling Service id Latency, #replicas Performance tuning Microservices CPU time, memory usage Platform Container id CPU & memory quota, Container distribution Kubernetes utilization, incoming & Autoscaling VM cluster outgoing bytes Docker Infrastructure VM id, volume id CPU & memory Root cause analysis VM, volumes Service name #read/write, I/O latency #requests, size of requests Queueing services of infrastructure service Hardware Server id, switch id Disk utilization, tra c Management of VMs Servers, network SAN, disks 11 ffi Monitoring System Requirements Comprehensive Low intrusion Extensibility Scalability Elasticity Accuracy Resilience 12 Blackbox and Whitebox Monitoring Blackbox Monitoring The monitored system is handled as a black box. No data are gained from the inside of the system. E.g. only the request interface of a service is visible nothing about the internal structure. Whitebox Monitoring Data is also from the inside of the system. This gives more context and more detailed insights. E.g. Internal organization of a service is visible, e.g., asynchronous internal handling of requests, load balancing, backend services. 13 Overheads Overheads lead to intrusion Lot of reasons for overheads Instrumentation Computation for aggregations Memory overhead for bu ering Time to push to disk or transfer to collector Storage overhead for long-term storage Reduction techniques Number of metrics Measurement frequency Representation Batching Sampling Long-term coarsening 14 ff Amazon CloudWatch Monitoring and management service Collects Metrics Logs: Cloud Watch Log Insights Cloud Watch Management Console 15 Amazon CloudWatch Metrics Metrics Preselected set of metrics per service, e.g. CPU utilization, data transfer, disk usage for EC2 instances Read/write latency for EBS volumes Request counts and latency for load balancer Number of messages sent for Simple Queueing Service (SQS) queues Freely available memory and storage for RDS DB instances Custom application and system metrics 16 Amazon CloudWatch Data are provided online Frequency every second to 5 minutes depending on the data and your account Data with < 1 minute granularity stored for three hours Data with 1 minute granularity stored for two weeks Data with 1 hour granularity stored for 15 months Actions View graphs and statistics Set Alarms Access Management Console web interface Command line interface Libraries for Java, Script languages, Windows.Net Web Service API 17 Amazon CloudWatch View graph and create alarm 18 Amazon CloudWatch Command line interface Provides commands to manage monitoring, e.g., list metrics, get statistics, de ne user metrics, add data for user metrics Usage Install command line interface Provide your Amazon credentials Use the provided commands 19 fi Amazon Cloud Watch Example: user de ned metrics De ne a user metric Specify data points with mon-put-data mon-put-data -m RequestLatency -n "GetStarted" -t 2010-10-29T20:30:00Z -v 87 -u Milliseconds You can also specify multiple aggregated data points in the form of sum, minimum, maximum, SampleCount Get Access statistics mon-get-stats -n GetStarted -m RequestLatency -s "Average" --start-time 2010-10-29T00:00:00Z -- headers Time Average Unit 2010-10-29 20:30:00 24.5 Milliseconds 2010-10-29 21:30:00 15.4 Milliseconds 2010-10-29 22:17:00 134.333 Milliseconds 20 fi fi Amazon Cloud Watch Pricing Basic monitoring free: standard metrics every 5 minutes Detailed monitoring of EC2 instances every minute Each alarm, each custom metric Cloudwatch API calls Free Tier Basic Monitoring Metrics (at 5-minute frequency) 2022 10 Detailed Monitoring Metrics (at 1-minute frequency) Metrics 1 Million API requests (not applicable to GetMetricData and GetMetricWidgetImage) Dashboard 3 Dashboards for up to 50 metrics per month 10 Alarm metrics (not applicable to high-resolution alarms) Alarms Logs 5GB Data (ingestion, archive storage, and data scanned by Logs Insights queries) Events All events except custom events are included 21 Prometheus for Metrics Prometheus is an open source monitoring system https://prometheus.io Initially built by soundcloud.com now a Cloud Native Foundation project Features Metric collection in form of time series Storage by a time series database Query language for accessing the time series Alerting Visualization 22 Cloud Native Computing Foundation Promotes the concept of Cloud Native Computing Pushes for a sustainable ecosystem for Cloud Native Computing Hosts several fast-growing open source projects including Kubernetes, Prometheus and Envoy. Sandbox, Incubating and Graduated projects depending on the adoption and stability Runs CloudNativeCon 2024 Paris KubeCon and CloudNativeCon CNCF.io 23 Logs De nition log A log is a sequence of immutable records of discrete events. Generated by applications, system level, infrastructure, any devices.... Event logs come in two forms: Plaintext —most common format of logs. Structured—much evangelized, typically JSON Typically huge amount of log data Logging can be con gured to levels Allows to drill down Di cult to analyze Representation ASCII - easily readable but ine cient with respect to space and time Binary - more e cient, example: protobuf from Google 31 ffi fi ffi fi ffi Protobuf What is it?