AWS Certified Solutions Architect Slides v38.pdf
Document Details
Uploaded by EasyGulf
Related
- PCSII Depression/Anxiety/Strong Emotions 2024 Document
- A Concise History of the World: A New World of Connections (1500-1800)
- Human Bio Test PDF
- University of Santo Tomas Pre-Laboratory Discussion of LA No. 1 PDF
- Vertebrate Pest Management PDF
- Lg 5 International Environmental Laws, Treaties, Protocols, and Conventions
Full Transcript
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Certified Solutions Architect Associate By Stéphane Maarek https://links.data https://links.da cumulus.com/aw...
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Certified Solutions Architect Associate By Stéphane Maarek https://links.data https://links.da cumulus.com/aw s-certified-sa- associate-coupon https://links.dat tacumulus.com acumulus.com/ /aws-certified- aws-cert- sa-associate- solution- coupon architect-pt- https://links.datacumulus.com/aws coupon https://links.datacumulus.com/aw COURSE s-certified-sa-associate-coupon EXTRA PRACTICE EXAMS -cert-solution-architect-pt-coupon © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Disclaimer: These slides are copyrighted and strictly for personal use only This document is reserved for people enrolled into the Ultimate AWS Solutions Architect Associate Course Please do not share this document, it is intended for personal use and exam preparation only, thank you. If you’ve obtained these slides for free on a website that is not the course’s website, please reach out to [email protected]. Thanks! Best of luck for the exam and happy learning! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Table of Contents Getting Started with AWS AWS Identity & Access Management (AWS IAM) Amazon EC2 – Basics Amazon EC2 – Associate Amazon EC2 – Instance Storage High Availability & Scalability RDS, Aurora & ElastiCache Amazon Route 53 Classic Solutions Architecture Amazon S3 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Table of Contents Amazon S3 – Advanced Amazon S3 – Security CloudFront & Global Accelerator AWS Storage Extras AWS Integration & Messaging Containers on AWS Serverless Overview Serverless Architectures Databases in AWS Data & Analytics © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Table of Contents Machine Learning AWS Monitoring, Audit & Performance Advanced Identity in AWS AWS Security & Encryption Amazon VPC Disaster Recovery & Migrations More Solutions Architecture Other Services White Papers & Architectures Exam Preparation Congratulations © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Certified Solutions Architect Associate Course SAA-C03 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Welcome! We’re starting in 5 minutes We’re going to prepare for the Solutions Architect exam - SAA-C03 It’s a challenging certification, so this course will be long and interesting Basic IT knowledge is necessary This course contains videos… From the Cloud Practitioner, Developer and SysOps course - shared knowledge Specific to the Solutions Architect exam - exciting ones on architecture! We will cover over 30 AWS services AWS / IT Beginners welcome! (but take your time, it’s not a race) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com My SAA-C03 certification: 96.1% © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com About me I’m Stephane! Worked as in IT consultant and AWS Solutions Architect, Developer & SysOps Worked with AWS many years: built websites, apps, streaming platforms Veteran Instructor on AWS (Certifications, CloudFormation, Lambda, EC2…) You can find me on GitHub: https://github.com/simplesteph LinkedIn: https://www.linkedin.com/in/stephanemaarek Medium: https://medium.com/@stephane.maarek Twitter: https://twitter.com/stephanemaarek © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What’s AWS? AWS (Amazon Web Services) is a Cloud Provider They provide you with servers and services that you can use on demand and scale easily AWS has revolutionized IT over time AWS powers some of the biggest websites in the world Amazon.com Netflix © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What we’ll learn in this course (and more!) Amazon Amazon ECR Amazon ECS AWS Elastic AWS IAM AWS KMS Amazon Auto Scaling EC2 Beanstalk Lambda S3 Amazon Amazon Amazon Amazon Amazon Amazon Amazon AWS Step Functions SES RDS Aurora DynamoDB ElastiCache SQS SNS Amazon AWS AWS Amazon API Elastic Load Amazon Amazon Amazon CloudWatch CloudFormation CloudTrail Gateway Balancing CloudFront Kinesis Route 53 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Navigating the AWS spaghetti bowl © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Udemy Tips © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Getting started with AWS © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Cloud History 2002: 2004: 2007: Internally Launched publicly Launched in launched with SQS Europe 2003: 2006: Amazon infrastructure is Re-launched one of their core strength. publicly with Idea to market SQS, S3 & EC2 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Cloud Number Facts In 2019, AWS had $35.02 billion in annual revenue AWS accounts for 47% of the market in 2019 (Microsoft is 2nd with 22%) Pioneer and Leader of the AWS Cloud Market for the 9th consecutive year Over 1,000,000 active users Gartner Magic Quadrant © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Cloud Use Cases AWS enables you to build sophisticated, scalable applications Applicable to a diverse set of industries Use cases include Enterprise IT, Backup & Storage, Big Data analytics Website hosting, Mobile & Social Apps Gaming © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Global Infrastructure AWS Regions AWS Availability Zones AWS Data Centers AWS Edge Locations / Points of Presence https://infrastructure.aws/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Regions AWS has Regions all around the world Names can be us-east-1, eu-west-3… A region is a cluster of data centers Most AWS services are region-scoped https://aws.amazon.com/about-aws/global-infrastructure/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com How to choose an AWS Region? If you need to launch a new application, where should you do it? Compliance with data governance and legal requirements: data never leaves a region without your explicit permission ? ? Proximity to customers: reduced latency Available services within a Region: new services ? ? and new features aren’t available in every Region Pricing: pricing varies region to region and is transparent in the service pricing page © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Availability Zones Each region has many availability zones AWS Region (usually 3, min is 3, max is 6). Example: Sydney: ap-southeast-2 ap-southeast-2a ap-southeast-2b ap-southeast-2a ap-southeast-2c Each availability zone (AZ) is one or more discrete data centers with redundant power, networking, and connectivity They’re separate from each other, so that ap-southeast-2b ap-southeast-2c they’re isolated from disasters They’re connected with high bandwidth, ultra-low latency networking © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Points of Presence (Edge Locations) Amazon has 400+ Points of Presence (400+ Edge Locations & 10+ Regional Caches) in 90+ cities across 40+ countries Content is delivered to end users with lower latency https://aws.amazon.com/cloudfront/features/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Tour of the AWS Console AWS has Global Services: Identity and Access Management (IAM) Route 53 (DNS service) CloudFront (Content Delivery Network) WAF (Web Application Firewall) Most AWS services are Region-scoped: Amazon EC2 (Infrastructure as a Service) Elastic Beanstalk (Platform as a Service) Lambda (Function as a Service) Rekognition (Software as a Service) Region Table: https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Identity and Access Management (AWS IAM) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM: Users & Groups IAM = Identity and Access Management, Global service Root account created by default, shouldn’t be used or shared Users are people within your organization, and can be grouped Groups only contain users, not other groups Users don’t have to belong to a group, and user can belong to multiple groups Group: Developers Group: Operations Group Audit Team Alice Bob Charles David Edward Fred © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM: Permissions { "Version": "2012-10-17", Users or Groups can be "Statement": [ { assigned JSON documents "Effect": "Allow", "Action": "ec2:Describe*", called policies }, "Resource": "*" These policies define the { "Effect": "Allow", permissions of the users "Action": "elasticloadbalancing:Describe*", "Resource": "*" In AWS you apply the least }, { privilege principle: don’t give "Effect": "Allow", "Action": [ more permissions than a user "cloudwatch:ListMetrics", "cloudwatch:GetMetricStatistics", needs ], "cloudwatch:Describe*" "Resource": "*" } ] } © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM Policies inheritance Audit Team Developers Operations inline Alice Bob Charles David Edward Fred © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM Policies Structure Consists of Version: policy language version, always include “2012-10- 17” Id: an identifier for the policy (optional) Statement: one or more individual statements (required) Statements consists of Sid: an identifier for the statement (optional) Effect: whether the statement allows or denies access (Allow, Deny) Principal: account/user/role to which this policy applied to Action: list of actions this policy allows or denies Resource: list of resources to which the actions applied to Condition: conditions for when this policy is in effect (optional) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM – Password Policy Strong passwords = higher security for your account In AWS, you can setup a password policy: Set a minimum password length Require specific character types: including uppercase letters lowercase letters numbers non-alphanumeric characters Allow all IAM users to change their own passwords Require users to change their password after some time (password expiration) Prevent password re-use © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Multi Factor Authentication - MFA Users have access to your account and can possibly change configurations or delete resources in your AWS account You want to protect your Root Accounts and IAM users MFA = password you know + security device you own Password + => Successful login Alice Main benefit of MFA: if a password is stolen or hacked, the account is not compromised © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com MFA devices options in AWS Virtual MFA device Universal 2nd Factor (U2F) Security Key Google Authenticator Authy YubiKey by Yubico (3rd party) (phone only) (phone only) Support for multiple root and IAM users Support for multiple tokens on a single device. using a single security key © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com MFA devices options in AWS Hardware Key Fob MFA Device Hardware Key Fob MFA Device for AWS GovCloud (US) Provided by Gemalto (3rd party) Provided by SurePassID (3rd party) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com How can users access AWS ? To access AWS, you have three options: AWS Management Console (protected by password + MFA) AWS Command Line Interface (CLI): protected by access keys AWS Software Developer Kit (SDK) - for code: protected by access keys Access Keys are generated through the AWS Console Users manage their own access keys Access Keys are secret, just like a password. Don’t share them Access Key ID ~= username Secret Access Key ~= password © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Example (Fake) Access Keys Access key ID: AKIASK4E37PV4983d6C Secret Access Key: AZPN3zojWozWCndIjhB0Unh8239a1bzbzO5fqqkZq Remember : don’t share your access keys © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What’s the AWS CLI? A tool that enables you to interact with AWS services using commands in your command-line shell Direct access to the public APIs of AWS services You can develop scripts to manage your resources It’s open-source https://github.com/aws/aws-cli Alternative to using AWS Management Console © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What’s the AWS SDK? AWS Software Development Kit (AWS SDK) Language-specific APIs (set of libraries) Enables you to access and manage AWS services programmatically AWS SDK Embedded within your application Supports SDKs (JavaScript, Python, PHP,.NET, Ruby, Java, Go, Node.js, C++) Mobile SDKs (Android, iOS, …) Your Application IoT Device SDKs (Embedded C, Arduino, …) Example: AWS CLI is built on AWS SDK for Python © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM Roles for Services IAM Role Some AWS service will need to perform actions on your behalf To do so, we will assign EC2 Instance permissions to AWS services (virtual server) with IAM Roles Common roles: EC2 Instance Roles Access AWS Lambda Function Roles Roles for CloudFormation © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM Security Tools IAM Credentials Repor t (account-level) a report that lists all your account's users and the status of their various credentials IAM Access Advisor (user-level) Access advisor shows the service permissions granted to a user and when those services were last accessed. You can use this information to revise your policies. © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM Guidelines & Best Practices Don’t use the root account except for AWS account setup One physical user = One AWS user Assign users to groups and assign permissions to groups Create a strong password policy Use and enforce the use of Multi Factor Authentication (MFA) Create and use Roles for giving permissions to AWS services Use Access Keys for Programmatic Access (CLI / SDK) Audit permissions of your account using IAM Credentials Report & IAM Access Advisor Never share IAM users & Access Keys © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM Section – Summary Users: mapped to a physical user, has a password for AWS Console Groups: contains users only Policies: JSON document that outlines permissions for users or groups Roles: for EC2 instances or AWS services Security: MFA + Password Policy AWS CLI: manage your AWS services using the command-line AWS SDK: manage your AWS services using a programming language Access Keys: access AWS using the CLI or SDK Audit: IAM Credential Reports & IAM Access Advisor © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon EC2 – Basics © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon EC2 EC2 is one of the most popular of AWS’ offering EC2 = Elastic Compute Cloud = Infrastructure as a Service It mainly consists in the capability of : Renting virtual machines (EC2) Storing data on virtual drives (EBS) Distributing load across machines (ELB) Scaling the services using an auto-scaling group (ASG) Knowing EC2 is fundamental to understand how the Cloud works © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 sizing & configuration options Operating System (OS): Linux, Windows or Mac OS How much compute power & cores (CPU) How much random-access memory (RAM) How much storage space: Network-attached (EBS & EFS) hardware (EC2 Instance Store) Network card: speed of the card, Public IP address Firewall rules: security group Bootstrap script (configure at first launch): EC2 User Data © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 User Data It is possible to bootstrap our instances using an EC2 User data script. bootstrapping means launching commands when a machine starts That script is only run once at the instance first start EC2 user data is used to automate boot tasks such as: Installing updates Installing software Downloading common files from the internet Anything you can think of The EC2 User Data Script runs with the root user © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Hands-On: Launching an EC2 Instance running Linux We’ll be launching our first virtual server using the AWS Console We’ll get a first high-level approach to the various parameters We’ll see that our web server is launched using EC2 user data We’ll learn how to start / stop / terminate our instance. © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Instance Types - Overview You can use different types of EC2 instances that are optimised for different use cases (https://aws.amazon.com/ec2/instance-types/) AWS has the following naming convention: m5.2xlarge m: instance class 5: generation (AWS improves them over time) 2xlarge: size within the instance class © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Instance Types – General Purpose Great for a diversity of workloads such as web servers or code repositories Balance between: Compute Memory Networking In the course, we will be using the t2.micro which is a General Purpose EC2 instance * this list will evolve over time, please check the AWS website for the latest information © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Instance Types – Compute Optimized Great for compute-intensive tasks that require high performance processors: Batch processing workloads Media transcoding High performance web servers High performance computing (HPC) Scientific modeling & machine learning Dedicated gaming servers * this list will evolve over time, please check the AWS website for the latest information © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Instance Types – Memory Optimized Fast performance for workloads that process large data sets in memory Use cases: High performance, relational/non-relational databases Distributed web scale cache stores In-memory databases optimized for BI (business intelligence) Applications performing real-time processing of big unstructured data * this list will evolve over time, please check the AWS website for the latest information © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Instance Types – Storage Optimized Great for storage-intensive tasks that require high, sequential read and write access to large data sets on local storage Use cases: High frequency online transaction processing (OLTP) systems Relational & NoSQL databases Cache for in-memory databases (for example, Redis) Data warehousing applications Distributed file systems * this list will evolve over time, please check the AWS website for the latest information © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Instance Types: example Storage Network EBS Bandwidth Instance vCPU Mem (GiB) Performance (Mbps) t2.micro 1 1 EBS-Only Low to Moderate t2.xlarge 4 16 EBS-Only Moderate c5d.4xlarge 16 32 1 x 400 NVMe SSD Up to 10 Gbps 4,750 r5.16xlarge 64 512 EBS Only 20 Gbps 13,600 m5.8xlarge 32 128 EBS Only 10 Gbps 6,800 t2.micro is part of the AWS free tier (up to 750 hours per month) Great website: https://instances.vantage.sh © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Introduction to Security Groups Security Groups are the fundamental of network security in AWS They control how traffic is allowed into or out of our EC2 Instances. Inbound traffic Security Group WWW Outbound traffic EC2 Instance Security groups only contain rules Security groups rules can reference by IP or by security group © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Security Groups Deeper Dive Security groups are acting as a “firewall” on EC2 instances They regulate: Access to Ports Authorised IP ranges – IPv4 and IPv6 Control of inbound network (from other to the instance) Control of outbound network (from the instance to other) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Security Groups Diagram Your Computer - IP XX.XX.XX.XX Security Group 1 Port 22 (authorised port 22) Inbound Filter IP / Port with Rules Port 22 Other computer (not authorised port 22) EC2 Instance IP XX.XX.XX.XX Security Group 1 WWW Outbound Any Port Any IP – Any Port Filter IP / Port with Rules © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Security Groups Good to know Can be attached to multiple instances Locked down to a region / VPC combination Does live “outside” the EC2 – if traffic is blocked the EC2 instance won’t see it It’s good to maintain one separate security group for SSH access If your application is not accessible (time out), then it’s a security group issue If your application gives a “connection refused“ error, then it’s an application error or it’s not launched All inbound traffic is blocked by default All outbound traffic is authorised by default © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Referencing other security groups Diagram Security Port 123 Group 2 EC2 Instance (attached) IP XX.XX.XX.XX Security Group 1 EC2 Instance Security Inbound EC2 Instance IP XX.XX.XX.XX Port 123 Group 1 Authorising Security Group 1 IP XX.XX.XX.XX (attached) Authorising Security Group 2 Security Port 123 Group 3 EC2 Instance IP XX.XX.XX.XX (attached) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Classic Ports to know 22 = SSH (Secure Shell) - log into a Linux instance 21 = FTP (File Transfer Protocol) – upload files into a file share 22 = SFTP (Secure File Transfer Protocol) – upload files using SSH 80 = HTTP – access unsecured websites 443 = HTTPS – access secured websites 3389 = RDP (Remote Desktop Protocol) – log into a Windows instance © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com SSH Summary Table EC2 Instance SSH Putty Connect Mac Linux Windows < 10 Windows >= 10 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Which Lectures to watch Mac / Linux: SSH on Mac/Linux lecture Windows: Putty Lecture If Windows 10: SSH on Windows 10 lecture All: EC2 Instance Connect lecture © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com SSH troubleshooting Students have the most problems with SSH If things don’t work… 1. Re-watch the lecture. You may have missed something 2. Read the troubleshooting guide 3. Try EC2 Instance Connect If one method works (SSH, Putty or EC2 Instance Connect) you’re good If no method works, that’s okay, the course won’t use SSH much © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com How to SSH into your EC2 Instance Linux / Mac OS X We’ll learn how to SSH into your EC2 instance using Linux / Mac SSH is one of the most important function. It allows you to control a remote machine, all using the command line. SSH – Port 22 WWW EC2 Instance Linux Public IP We will see how we can configure OpenSSH ~/.ssh/config to facilitate the SSH into our EC2 instances © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com How to SSH into your EC2 Instance Windows We’ll learn how to SSH into your EC2 instance using Windows SSH is one of the most important function. It allows you to control a remote machine, all using the command line. SSH – Port 22 WWW EC2 Instance Linux Public IP We will configure all the required parameters necessary for doing SSH on Windows using the free tool Putty. © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Instance Connect Connect to your EC2 instance within your browser No need to use your key file that was downloaded The “magic” is that a temporary key is uploaded onto EC2 by AWS Works only out-of-the-box with Amazon Linux 2 Need to make sure the port 22 is still opened! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Instances Purchasing Options On-Demand Instances – short workload, predictable pricing, pay by second Reserved (1 & 3 years) Reserved Instances – long workloads Conver tible Reserved Instances – long workloads with flexible instances Savings Plans (1 & 3 years) –commitment to an amount of usage, long workload Spot Instances – short workloads, cheap, can lose instances (less reliable) Dedicated Hosts – book an entire physical server, control instance placement Dedicated Instances – no other customers will share your hardware Capacity Reservations – reserve capacity in a specific AZ for any duration © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 On Demand Pay for what you use: Linux or Windows - billing per second, after the first minute All other operating systems - billing per hour Has the highest cost but no upfront payment No long-term commitment Recommended for shor t-term and un-interrupted workloads, where you can't predict how the application will behave © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Reserved Instances Up to 72% discount compared to On-demand You reserve a specific instance attributes (Instance Type, Region, Tenancy, OS) Reservation Period – 1 year (+discount) or 3 years (+++discount) Payment Options – No Upfront (+), Par tial Upfront (++), All Upfront (+++) Reserved Instance’s Scope – Regional or Zonal (reserve capacity in an AZ) Recommended for steady-state usage applications (think database) You can buy and sell in the Reserved Instance Marketplace Conver tible Reserved Instance Can change the EC2 instance type, instance family, OS, scope and tenancy Up to 66% discount Note: the % discounts are different from the video as AWS change them over time – the exact numbers are not needed for the exam. This is just for illustrative purposes J © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Savings Plans Get a discount based on long-term usage (up to 72% - same as RIs) Commit to a certain type of usage ($10/hour for 1 or 3 years) Usage beyond EC2 Savings Plans is billed at the On-Demand price Locked to a specific instance family & AWS region (e.g., M5 in us-east-1) Flexible across: Instance Size (e.g., m5.xlarge, m5.2xlarge) OS (e.g., Linux, Windows) Tenancy (Host, Dedicated, Default) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Spot Instances Can get a discount of up to 90% compared to On-demand Instances that you can “lose” at any point of time if your max price is less than the current spot price The MOST cost-efficient instances in AWS Useful for workloads that are resilient to failure Batch jobs Data analysis Image processing Any distributed workloads Workloads with a flexible start and end time Not suitable for critical jobs or databases © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Dedicated Hosts A physical server with EC2 instance capacity fully dedicated to your use Allows you address compliance requirements and use your existing server- bound software licenses (per-socket, per-core, pe—VM software licenses) Purchasing Options: On-demand – pay per second for active Dedicated Host Reserved - 1 or 3 years (No Upfront, Partial Upfront, All Upfront) The most expensive option Useful for software that have complicated licensing model (BYOL – Bring Your Own License) Or for companies that have strong regulatory or compliance needs © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Dedicated Instances Instances run on hardware that’s dedicated to you May share hardware with other instances in same account No control over instance placement (can move hardware after Stop / Start) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Capacity Reservations Reserve On-Demand instances capacity in a specific AZ for any duration You always have access to EC2 capacity when you need it No time commitment (create/cancel anytime), no billing discounts Combine with Regional Reserved Instances and Savings Plans to benefit from billing discounts You’re charged at On-Demand rate whether you run instances or not Suitable for short-term, uninterrupted workloads that needs to be in a specific AZ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Which purchasing option is right for me? On demand: coming and staying in resort whenever we like, we pay the full price Reserved: like planning ahead and if we plan to stay for a long time, we may get a good discount. Savings Plans: pay a certain amount per hour for certain period and stay in any room type (e.g., King, Suite, Sea View, …) Spot instances: the hotel allows people to bid for the empty rooms and the highest bidder keeps the rooms. You can get kicked out at any time Dedicated Hosts: We book an entire building of the resort Capacity Reservations: you book a room for a period with full price even you don’t stay in it © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Price Comparison Example – m4.large – us-east-1 Price Type Price (per hour) On-Demand $0.10 Spot Instance (Spot Price) $0.038 - $0.039 (up to 61% off) Reserved Instance (1 year) $0.062 (No Upfront) - $0.058 (All Upfront) Reserved Instance (3 years) $0.043 (No Upfront) - $0.037 (All Upfront) EC2 Savings Plan (1 year) $0.062 (No Upfront) - $0.058 (All Upfront) Reserved Convertible Instance (1 year) $0.071 (No Upfront) - $0.066 (All Upfront) Dedicated Host On-Demand Price Dedicated Host Reservation Up to 70% off Capacity Reservations On-Demand Price © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Spot Instance Requests Can get a discount of up to 90% compared to On-demand Define max spot price and get the instance while current spot price < max The hourly spot price varies based on offer and capacity If the current spot price > your max price you can choose to stop or terminate your instance with a 2 minutes grace period. Other strategy: Spot Block “block” spot instance during a specified time frame (1 to 6 hours) without interruptions In rare situations, the instance may be reclaimed Used for batch jobs, data analysis, or workloads that are resilient to failures. Not great for critical jobs or databases © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Spot Instances Pricing User-defined max price https://console.aws.amazon.com/ec2sp/v1/spot/home?region=us-east-1# © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com How to terminate Spot Instances? You can only cancel Spot Instance requests that are open, active, or disabled. Cancelling a Spot Request does not terminate instances You must first cancel a Spot Request, and then terminate the associated Spot Instances https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Spot Fleets Spot Fleets = set of Spot Instances + (optional) On-Demand Instances The Spot Fleet will try to meet the target capacity with price constraints Define possible launch pools: instance type (m5.large), OS, Availability Zone Can have multiple launch pools, so that the fleet can choose Spot Fleet stops launching instances when reaching capacity or max cost Strategies to allocate Spot Instances: lowestPrice: from the pool with the lowest price (cost optimization, short workload) diversified: distributed across all pools (great for availability, long workloads) capacityOptimized: pool with the optimal capacity for the number of instances priceCapacityOptimized (recommended): pools with highest capacity available, then select the pool with the lowest price (best choice for most workloads) Spot Fleets allow us to automatically request Spot Instances with the lowest price © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon EC2 – Associate © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Private vs Public IP (IPv4) Networking has two sorts of IPs. IPv4 and IPv6: IPv4: 1.160.10.240 IPv6: 3ffe:1900:4545:3:200:f8ff:fe21:67cf In this course, we will only be using IPv4. IPv4 is still the most common format used online. IPv6 is newer and solves problems for the Internet of Things (IoT). IPv4 allows for 3.7 billion different addresses in the public space IPv4: [0-255].[0-255].[0-255].[0-255]. © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Private vs Public IP (IPv4) Example Web Server (public): Server (public): 79.216.59.75 211.139.37.43 WWW Internet Gateway (public): Internet Gateway (public): 149.140.72.10 253.144.139.205 Company A Company B Private Network Private Network 192.168.0.1/22 192.168.0.1/22 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Private vs Public IP (IPv4) Fundamental Differences Public IP: Public IP means the machine can be identified on the internet (WWW) Must be unique across the whole web (not two machines can have the same public IP). Can be geo-located easily Private IP: Private IP means the machine can only be identified on a private network only The IP must be unique across the private network BUT two different private networks (two companies) can have the same IPs. Machines connect to WWW using a NAT + internet gateway (a proxy) Only a specified range of IPs can be used as private IP © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Elastic IPs When you stop and then start an EC2 instance, it can change its public IP. If you need to have a fixed public IP for your instance, you need an Elastic IP An Elastic IP is a public IPv4 IP you own as long as you don’t delete it You can attach it to one instance at a time © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Elastic IP With an Elastic IP address, you can mask the failure of an instance or software by rapidly remapping the address to another instance in your account. You can only have 5 Elastic IP in your account (you can ask AWS to increase that). Overall, try to avoid using Elastic IP: They often reflect poor architectural decisions Instead, use a random public IP and register a DNS name to it Or, as we’ll see later, use a Load Balancer and don’t use a public IP © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Private vs Public IP (IPv4) In AWS EC2 – Hands On By default, your EC2 machine comes with: A private IP for the internal AWS Network A public IP, for the WWW. When we are doing SSH into our EC2 machines: We can’t use a private IP, because we are not in the same network We can only use the public IP. If your machine is stopped and then started, the public IP can change © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Placement Groups Sometimes you want control over the EC2 Instance placement strategy That strategy can be defined using placement groups When you create a placement group, you specify one of the following strategies for the group: Cluster—clusters instances into a low-latency group in a single Availability Zone Spread—spreads instances across underlying hardware (max 7 instances per group per AZ) Partition—spreads instances across many different partitions (which rely on different sets of racks) within an AZ. Scales to 100s of EC2 instances per group (Hadoop, Cassandra, Kafka) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Placement Groups Cluster EC2 EC2 EC2 Placement group Cluster Same AZ Low latency 10 Gbps network EC2 EC2 EC2 Pros: Great network (10 Gbps bandwidth between instances with Enhanced Networking enabled - recommended) Cons: If the AZ fails, all instances fails at the same time Use case: Big Data job that needs to complete fast Application that needs extremely low latency and high network throughput © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Placement Groups Spread Us-east-1a Us-east-1b Us-east-1c Pros: Can span across Availability Zones (AZ) Reduced risk is simultaneous EC2 EC2 EC2 failure EC2 Instances are on different physical hardware Hardware 1 Hardware 3 Hardware 5 Cons: Limited to 7 instances per AZ per placement group Use case: EC2 EC2 EC2 Application that needs to maximize high availability Critical Applications where Hardware 2 Hardware 4 Hardware 6 each instance must be isolated from failure from each other © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Placements Groups Partition us-east-1a us-east-1b Up to 7 partitions per AZ Can span across multiple AZs in the same region EC2 EC2 EC2 Up to 100s of EC2 instances The instances in a partition do not EC2 EC2 EC2 share racks with the instances in the other partitions A partition failure can affect many EC2 EC2 EC2 EC2 but won’t affect other partitions EC2 instances get access to the EC2 EC2 EC2 partition information as metadata Use cases: HDFS, HBase, Cassandra, Partition 1 Partition 2 Partition 3 Kafka © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Elastic Network Interfaces (ENI) Logical component in a VPC that represents a vir tual network card Availability Zone The ENI can have the following attributes: Eth0 – primary ENI Primary private IPv4, one or more secondary IPv4 EC2 192.168.0.31 One Elastic IP (IPv4) per private IPv4 Eth1 – secondary ENI One Public IPv4 192.168.0.42 One or more security groups Can be moved A MAC address You can create ENI independently and attach Eth0 – primary ENI them on the fly (move them) on EC2 instances EC2 for failover Bound to a specific availability zone (AZ) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Hibernate We know we can stop, terminate instances Stop – the data on disk (EBS) is kept intact in the next start Terminate – any EBS volumes (root) also set-up to be destroyed is lost On start, the following happens: First start: the OS boots & the EC2 User Data script is run Following starts: the OS boots up Then your application starts, caches get warmed up, and that can take time! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Hibernate EC2 Instance Running RAM Introducing EC2 Hibernate: Root EBS Volume (Encrypted) The in-memory (RAM) state is preserved Hibernate The instance boot is much faster! (the OS is not stopped / restarted) Stopping Under the hood: the RAM state is written RAM to a file in the root EBS volume RAM The root EBS volume must be encrypted Hibernation Shutdown Use cases: Stopped Long-running processing RAM Start Saving the RAM state Services that take time to initialize Running RAM © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Hibernate – Good to know Suppor ted Instance Families – C3, C4, C5, I3, M3, M4, R3, R4, T2, T3, … Instance RAM Size – must be less than 150 GB. Instance Size – not supported for bare metal instances. AMI – Amazon Linux 2, Linux AMI, Ubuntu, RHEL, CentOS & Windows… Root Volume – must be EBS, encrypted, not instance store, and large Available for On-Demand, Reserved and Spot Instances An instance can NOT be hibernated more than 60 days © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon EC2 – Instance Storage © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What’s an EBS Volume? An EBS (Elastic Block Store) Volume is a network drive you can attach to your instances while they run It allows your instances to persist data, even after their termination They can only be mounted to one instance at a time (at the CCP level) They are bound to a specific availability zone Analogy: Think of them as a “network USB stick” Free tier: 30 GB of free EBS storage of type General Purpose (SSD) or Magnetic per month © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Volume It’s a network drive (i.e. not a physical drive) It uses the network to communicate the instance, which means there might be a bit of latency It can be detached from an EC2 instance and attached to another one quickly It’s locked to an Availability Zone (AZ) An EBS Volume in us-east-1a cannot be attached to us-east-1b To move a volume across, you first need to snapshot it Have a provisioned capacity (size in GBs, and IOPS) You get billed for all the provisioned capacity You can increase the capacity of the drive over time © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Volume - Example US-EAST-1A US-EAST-1B EBS EBS EBS EBS EBS (10 GB) (100 GB) (50 GB) (50 GB) (10 GB) unattached © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS – Delete on Termination attribute Controls the EBS behaviour when an EC2 instance terminates By default, the root EBS volume is deleted (attribute enabled) By default, any other attached EBS volume is not deleted (attribute disabled) This can be controlled by the AWS console / AWS CLI Use case: preserve root volume when instance is terminated © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Snapshots Make a backup (snapshot) of your EBS volume at a point in time Not necessary to detach volume to do snapshot, but recommended Can copy snapshots across AZ or Region US-EAST-1A US-EAST-1B EBS Snapshot EBS snapshot restore EBS (50 GB) (50 GB) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Snapshots Features EBS Snapshot EBS Snapshot Archive EBS Snapshot Archive Move a Snapshot to an ”archive tier” that is archive 75% cheaper Takes within 24 to 72 hours for restoring the archive Recycle Bin for EBS Snapshots EBS Snapshot Recycle Bin Setup rules to retain deleted snapshots so you can recover them after an accidental deletion delete Specify retention (from 1 day to 1 year) Fast Snapshot Restore (FSR) Force full initialization of snapshot to have no latency on the first use ($$$) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AMI Overview AMI = Amazon Machine Image AMI are a customization of an EC2 instance You add your own software, configuration, operating system, monitoring… Faster boot / configuration time because all your software is pre-packaged AMI are built for a specific region (and can be copied across regions) You can launch EC2 instances from: A Public AMI: AWS provided Your own AMI: you make and maintain them yourself An AWS Marketplace AMI: an AMI someone else made (and potentially sells) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AMI Process (from an EC2 instance) Start an EC2 instance and customize it Stop the instance (for data integrity) Build an AMI – this will also create EBS snapshots Launch instances from other AMIs Custom AMI US-EAST-1A US-EAST-1B Launch Create AMI from AMI © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EC2 Instance Store EBS volumes are network drives with good but “limited” performance If you need a high-performance hardware disk, use EC2 Instance Store Better I/O performance EC2 Instance Store lose their storage if they’re stopped (ephemeral) Good for buffer / cache / scratch data / temporary content Risk of data loss if hardware fails Backups and Replication are your responsibility © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Very high IOPS Local EC2 Instance Store © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Volume Types EBS Volumes come in 6 types gp2 / gp3 (SSD): General purpose SSD volume that balances price and performance for a wide variety of workloads io1 / io2 Block Express (SSD): Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads st1 (HDD): Low cost HDD volume designed for frequently accessed, throughput- intensive workloads sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads EBS Volumes are characterized in Size | Throughput | IOPS (I/O Ops Per Sec) When in doubt always consult the AWS documentation – it’s good! Only gp2/gp3 and io1/io2 Block Express can be used as boot volumes © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Volume Types Use cases General Purpose SSD Cost effective storage, low-latency System boot volumes, Virtual desktops, Development and test environments 1 GiB - 16 TiB gp3: Baseline of 3,000 IOPS and throughput of 125 MiB/s Can increase IOPS up to 16,000 and throughput up to 1000 MiB/s independently gp2: Small gp2 volumes can burst IOPS to 3,000 Size of the volume and IOPS are linked, max IOPS is 16,000 3 IOPS per GB, means at 5,334 GB we are at the max IOPS © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Volume Types Use cases Provisioned IOPS (PIOPS) SSD Critical business applications with sustained IOPS performance Or applications that need more than 16,000 IOPS Great for databases workloads (sensitive to storage perf and consistency) io1 (4 GiB - 16 TiB): Max PIOPS: 64,000 for Nitro EC2 instances & 32,000 for other Can increase PIOPS independently from storage size io2 Block Express (4 GiB – 64 TiB): Sub-millisecond latency Max PIOPS: 256,000 with an IOPS:GiB ratio of 1,000:1 Supports EBS Multi-attach © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Volume Types Use cases Hard Disk Drives (HDD) Cannot be a boot volume 125 GiB to 16 TiB Throughput Optimized HDD (st1) Big Data, Data Warehouses, Log Processing Max throughput 500 MiB/s – max IOPS 500 Cold HDD (sc1): For data that is infrequently accessed Scenarios where lowest cost is important Max throughput 250 MiB/s – max IOPS 250 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS – Volume Types Summary https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html#solid-state-drives © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Multi-Attach – io1/io2 family Attach the same EBS volume to multiple EC2 instances in the same AZ Availability Zone 1 Each instance has full read & write permissions to the high-performance volume Use case: Achieve higher application availability in clustered Linux applications (ex: Teradata) Applications must manage concurrent write operations Up to 16 EC2 Instances at a time Must use a file system that’s cluster-aware (not io2 volume with Multi-Attach XFS, EXT4, etc…) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS Encryption When you create an encrypted EBS volume, you get the following: Data at rest is encrypted inside the volume All the data in flight moving between the instance and the volume is encrypted All snapshots are encrypted All volumes created from the snapshot Encryption and decryption are handled transparently (you have nothing to do) Encryption has a minimal impact on latency EBS Encryption leverages keys from KMS (AES-256) Copying an unencrypted snapshot allows encryption Snapshots of encrypted volumes are encrypted © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Encryption: encrypt an unencrypted EBS volume Create an EBS snapshot of the volume Encrypt the EBS snapshot ( using copy ) Create new ebs volume from the snapshot ( the volume will also be encrypted ) Now you can attach the encrypted volume to the original instance © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon EFS – Elastic File System Managed NFS (network file system) that can be mounted on many EC2 EFS works with EC2 instances in multi-AZ Highly available, scalable, expensive (3x gp2), pay per use us-east-1a us-east-1b us-east-1c EC2 Instances EC2 Instances EC2 Instances Security Group EFS FileSystem © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon EFS – Elastic File System Use cases: content management, web serving, data sharing, Wordpress Uses NFSv4.1 protocol Uses security group to control access to EFS Compatible with Linux based AMI (not Windows) Encryption at rest using KMS POSIX file system (~Linux) that has a standard file API File system scales automatically, pay-per-use, no capacity planning! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EFS – Performance & Storage Classes EFS Scale 1000s of concurrent NFS clients, 10 GB+ /s throughput Grow to Petabyte-scale network file system, automatically Performance Mode (set at EFS creation time) General Purpose (default) – latency-sensitive use cases (web server, CMS, etc…) Max I/O – higher latency, throughput, highly parallel (big data, media processing) Throughput Mode Bursting – 1 TB = 50MiB/s + burst of up to 100MiB/s Provisioned – set your throughput regardless of storage size, ex: 1 GiB/s for 1 TB storage Elastic – automatically scales throughput up or down based on your workloads Up to 3GiB/s for reads and 1GiB/s for writes Used for unpredictable workloads © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EFS – Storage Classes Storage Tiers (lifecycle management feature – move file after N days) Standard: for frequently accessed files Infrequent access (EFS-IA): cost to retrieve files, lower price to store. Archive: rarely accessed data (few times each year), 50% cheaper no access for 60 days Implement lifecycle policies to move files between storage tiers EFS Standard Availability and durability move Lifecycle Policy Standard: Multi-AZ, great for prod One Zone: One AZ, great for dev, backup enabled by default, compatible with IA (EFS One Zone-IA) Over 90% in cost savings EFS IA Amazon EFS File System © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS vs EFS – Elastic Block Storage EBS volumes… Availability Zone 1 Availability Zone 2 one instance (except multi-attach io1/io2) are locked at the Availability Zone (AZ) level gp2: IO increases if the disk size increases gp3 & io1: can increase IO independently To migrate an EBS volume across AZ Take a snapshot Restore the snapshot to another AZ EBS EBS EBS backups use IO and you shouldn’t run them while your application is handling a lot of traffic snapshot restore Root EBS Volumes of instances get terminated by default if the EC2 instance gets terminated. (you can disable that) EBS Snapshot © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com EBS vs EFS – Elastic File System Mounting 100s of instances across AZ Availability Zone 1 Availability Zone 2 EFS share website files (WordPress) Linux Linux Only for Linux Instances (POSIX) EFS has a higher price point than EBS EFS EFS Can leverage Storage Tiers for cost savings Mount Mount Target Target Remember: EFS vs EBS vs Instance Store EFS © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com High Availability & Scalability © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Scalability & High Availability Scalability means that an application / system can handle greater loads by adapting. There are two kinds of scalability: Vertical Scalability Horizontal Scalability (= elasticity) Scalability is linked but different to High Availability Let’s deep dive into the distinction, using a call center as an example © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Vertical Scalability Vertically scalability means increasing the size of the instance For example, your application runs on a t2.micro Scaling that application vertically means running it on a t2.large Vertical scalability is very common for non distributed systems, such as a database. RDS, ElastiCache are services that can scale vertically. There’s usually a limit to how much you can vertically scale (hardware limit) junior operator senior operator © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Horizontal Scalability operator operator operator Horizontal Scalability means increasing the number of instances / systems for your application Horizontal scaling implies distributed systems. This is very common for web applications / modern applications It’s easy to horizontally scale thanks the cloud offerings such as Amazon EC2 operator operator operator © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com High Availability first building in New York High Availability usually goes hand in hand with horizontal scaling High availability means running your application / system in at least 2 data centers (== Availability Zones) The goal of high availability is to survive a data center loss second building in San Francisco The high availability can be passive (for RDS Multi AZ for example) The high availability can be active (for horizontal scaling) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com High Availability & Scalability For EC2 Vertical Scaling: Increase instance size (= scale up / down) From: t2.nano - 0.5G of RAM, 1 vCPU To: u-12tb1.metal – 12.3 TB of RAM, 448 vCPUs Horizontal Scaling: Increase number of instances (= scale out / in) Auto Scaling Group Load Balancer High Availability: Run instances for the same application across multi AZ Auto Scaling Group multi AZ Load Balancer multi AZ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What is load balancing? Load Balances are servers that forward traffic to multiple servers (e.g., EC2 instances) downstream Elastic Load Balancer EC2 Instance EC2 Instance EC2 Instance © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Why use a load balancer? Spread load across multiple downstream instances Expose a single point of access (DNS) to your application Seamlessly handle failures of downstream instances Do regular health checks to your instances Provide SSL termination (HTTPS) for your websites Enforce stickiness with cookies High availability across zones Separate public traffic from private traffic © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Why use an Elastic Load Balancer? An Elastic Load Balancer is a managed load balancer AWS guarantees that it will be working AWS takes care of upgrades, maintenance, high availability AWS provides only a few configuration knobs It costs less to setup your own load balancer but it will be a lot more effort on your end It is integrated with many AWS offerings / services EC2, EC2 Auto Scaling Groups, Amazon ECS AWS Certificate Manager (ACM), CloudWatch Route 53, AWS WAF, AWS Global Accelerator © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Health Checks Health Checks are crucial for Load Balancers They enable the load balancer to know if instances it forwards traffic to are available to reply to requests The health check is done on a port and a route (/health is common) If the response is not 200 (OK), then the instance is unhealthy Protocol: HTTP Port: 4567 Health Checks Endpoint: /health Elastic Load Balancer EC2 Instance © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Types of load balancer on AWS AWS has 4 kinds of managed Load Balancers Classic Load Balancer (v1 - old generation) – 2009 – CLB HTTP, HTTPS, TCP, SSL (secure TCP) Application Load Balancer (v2 - new generation) – 2016 – ALB HTTP, HTTPS, WebSocket Network Load Balancer (v2 - new generation) – 2017 – NLB TCP, TLS (secure TCP), UDP Gateway Load Balancer – 2020 – GWLB Operates at layer 3 (Network layer) – IP Protocol Overall, it is recommended to use the newer generation load balancers as they provide more features Some load balancers can be setup as internal (private) or external (public) ELBs © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Load Balancer Security Groups LOAD BALANCER HTTPS / HTTP HTTP Restricted From anywhere to Load balancer Users EC2 Load Balancer Security Group: Application Security Group: Allow traffic only from Load Balancer © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Classic Load Balancers (v1) Supports TCP (Layer 4), HTTP & HTTPS (Layer 7) listener internal Health checks are TCP or HTTP based Fixed hostname Client CLB EC2 XXX.region.elb.amazonaws.com © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Application Load Balancer (v2) Application load balancers is Layer 7 (HTTP) Load balancing to multiple HTTP applications across machines (target groups) Load balancing to multiple applications on the same machine (ex: containers) Support for HTTP/2 and WebSocket Support redirects (from HTTP to HTTPS for example) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Application Load Balancer (v2) Routing tables to different target groups: Routing based on path in URL (example.com/users & example.com/posts) Routing based on hostname in URL (one.example.com & other.example.com) Routing based on Query String, Headers (example.com/users?id=123&order=false) ALB are a great fit for micro services & container-based application (example: Docker & Amazon ECS) Has a port mapping feature to redirect to a dynamic port in ECS In comparison, we’d need multiple Classic Load Balancer per application © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Application Load Balancer (v2) HTTP Based Traffic Target Group Health Check application for Users Route /user HTTP WWW External Application Load Balancer (v2) Target Group Health Check application for Search Route /search HTTP WWW © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Application Load Balancer (v2) Target Groups EC2 instances (can be managed by an Auto Scaling Group) – HTTP ECS tasks (managed by ECS itself) – HTTP Lambda functions – HTTP request is translated into a JSON event IP Addresses – must be private IPs ALB can route to multiple target groups Health checks are at the target group level © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Application Load Balancer (v2) Query Strings/Parameters Routing