AWS Certified Cloud Practitioner Slides PDF
Document Details
Uploaded by SatisfyingIvy
Stéphane Maarek
Tags
Summary
These slides present a course for the AWS Certified Cloud Practitioner exam. They cover various AWS services, including IAM, EC2, S3, and more. The slides are aimed at preparing individuals for the exam, providing both theoretical and practical information to aid in exam success
Full Transcript
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Certified Cloud Practitioner By Stéphane Maarek https://links.data https://links.da cumulus.com/aw...
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Certified Cloud Practitioner By Stéphane Maarek https://links.data https://links.da cumulus.com/aw s-certified- practitioner- https://links.dat tacumulus.com coupon acumulus.com/ /aws-certified- aws-cert-cloud- practitioner- practitioner-pt- coupon coupon https://links.datacumulus.com/aw COURSE https://links.datacumulus.com/aws EXTRA PRACTICE EXAMS s-certified-practitioner-coupon -cert-cloud-practitioner-pt-coupon © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Disclaimer: These slides are copyrighted and strictly for personal use only This document is reserved for people enrolled into the AWS Certified Cloud Practitioner Course Please do not share this document, it is intended for personal use and exam preparation only, thank you. Best of luck for the exam and happy learning! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Table of Contents What is Cloud Computing? AWS Identity & Access Management Amazon EC2 Amazon EC2 Instance Storage Elastic Load Balancing & Auto Scaling Group Amazon S3 Databases & Analytics Other Compute Services Deploying & Managing Infrastructure at Scale Global Infrastructure Cloud Integration © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Table of Contents Cloud Monitoring Amazon VPC Security & Compliance Machine Learning Account Management, Billing, & Support Advanced Identity Other AWS Services AWS Architecting & EcoSystem Exam Preparation Congratulations © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Certified Cloud Practitioner Course CLF-C02 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Welcome! We’re starting in 5 minutes We’re going to prepare for the Cloud Practitioner exam – CLF-C02 It’s a challenging certification, so this course will be long and interesting Basic IT knowledge is helpful, but I will explain everything We will cover over 40 AWS services (out of the 200+ in AWS) AWS / IT Beginners welcome! (but take your time, it’s not a race) Learn by doing – key learning technique! This course mixes both theory & hands on © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Sample question: Certified Cloud Practitioner Which AWS service would simplify the migration of a database to AWS? A) AWS Storage Gateway 70%), then add 2 units When a CloudWatch alarm is triggered (example CPU < 30%), then remove 1 Target Tracking Scaling Example: I want the average ASG CPU to stay at around 40% Scheduled Scaling Anticipate a scaling based on known usage patterns Example: increase the min. capacity to 10 at 5 pm on Fridays © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Auto Scaling Groups – Scaling Strategies Predictive Scaling Uses Machine Learning to predict future traffic ahead of time Automatically provisions the right number of EC2 instances in advance Useful when your load has predictable time- based patterns © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com ELB & ASG – Summary High Availability vs Scalability (vertical and horizontal) vs Elasticity vs Agility in the Cloud Elastic Load Balancers (ELB) Distribute traffic across backend EC2 instances, can be Multi-AZ Supports health checks 4 types: Classic (old), Application (HTTP – L7), Network (TCP – L4), Gateway (L3) Auto Scaling Groups (ASG) Implement Elasticity for your application, across multiple AZ Scale EC2 instances based on the demand on your system, replace unhealthy Integrated with the ELB © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 Section © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Section introduction Amazon S3 is one of the main building blocks of AWS It’s advertised as ”infinitely scaling” storage Many websites use Amazon S3 as a backbone Many AWS services use Amazon S3 as an integration as well We’ll have a step-by-step approach to S3 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 Use cases Backup and storage Disaster Recovery Archive Nasdaq stores 7 years of data into S3 Glacier Hybrid Cloud storage Application hosting Media hosting Data lakes & big data analytics Sysco runs analytics on Software delivery its data and gain business insights Static website © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 - Buckets Amazon S3 allows people to store objects (files) in “buckets” (directories) Buckets must have a globally unique name (across all regions all accounts) Buckets are defined at the region level S3 looks like a global service but buckets are created in a region Naming convention No uppercase, No underscore 3-63 characters long Not an IP Must start with lowercase letter or number Must NOT start with the prefix xn-- S3 Bucket Must NOT end with the suffix -s3alias © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 - Objects Objects (files) have a Key The key is the FULL path: s3://my-bucket/my_file.txt s3://my-bucket/my_folder1/another_folder/my_file.txt Object The key is composed of prefix + object name s3://my-bucket/my_folder1/another_folder/my_file.txt There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise) Just keys with very long names that contain slashes (“/”) S3 Bucket with Objects © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 – Objects (cont.) Object values are the content of the body: Max. Object Size is 5TB (5000GB) If uploading more than 5GB, must use “multi-part upload” Metadata (list of text key / value pairs – system or user metadata) Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle Version ID (if versioning is enabled) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 – Security User-Based IAM Policies – which API calls should be allowed for a specific user from IAM Resource-Based Bucket Policies – bucket wide rules from the S3 console - allows cross account Object Access Control List (ACL) – finer grain (can be disabled) Bucket Access Control List (ACL) – less common (can be disabled) Note: an IAM principal can access an S3 object if The user IAM permissions ALLOW it OR the resource policy ALLOWS it AND there’s no explicit DENY Encryption: encrypt objects in Amazon S3 using encryption keys © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com S3 Bucket Policies JSON based policies Resources: buckets and objects Effect: Allow / Deny Actions: Set of API to Allow or Deny Principal: The account or user to apply the policy to Use S3 bucket for policy to: Grant public access to the bucket Force objects to be encrypted at upload Grant access to another account (Cross Account) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Example: Public Access - Use Bucket Policy S3 Bucket Policy Allows Public Access Anonymous www website visitor S3 Bucket © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Example: User Access to S3 – IAM permissions IAM Policy IAM User S3 Bucket © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Example: EC2 instance access - Use IAM Roles IAM permissions EC2 Instance Role EC2 Instance S3 Bucket © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Advanced: Cross-Account Access – Use Bucket Policy S3 Bucket Policy Allows Cross-Account IAM User Other AWS account S3 Bucket © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Bucket settings for Block Public Access These settings were created to prevent company data leaks If you know your bucket should never be public, leave these on Can be set at the account level © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 – Static Website Hosting User S3 can host static websites and have them accessible on the Internet http://demo-bucket.s3-website-us-west-2.amazonaws.com http://demo-bucket.s3-website.us-west-2.amazonaws.com The website URL will be (depending on the region) http://bucket-name.s3-website-aws-region.amazonaws.com OR us-west-2 http://bucket-name.s3-website.aws-region.amazonaws.com S3 Bucket If you get a 403 Forbidden error, make sure the bucket (demo-bucket) policy allows public reads! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 - Versioning User You can version your files in Amazon S3 It is enabled at the bucket level upload Same key overwrite will change the “version”: 1, 2, 3…. It is best practice to version your buckets Protect against unintended deletes (ability to restore a version) S3 Bucket (my-bucket) Easy roll back to previous version Version 1 Version 2 Notes: Version 3 Any file that is not versioned prior to enabling versioning will have version “null” s3://my-bucket/my-file.docx Suspending versioning does not delete the previous versions © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 – Replication (CRR & SRR) Must enable Versioning in source and destination buckets Cross-Region Replication (CRR) Same-Region Replication (SRR) S3 Bucket (eu-west-1) Buckets can be in different AWS accounts Copying is asynchronous Must give proper IAM permissions to S3 asynchronous replication Use cases: CRR – compliance, lower latency access, replication across accounts S3 Bucket (us-east-2) SRR – log aggregation, live replication between production and test accounts © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com S3 Storage Classes Amazon S3 Standard - General Purpose Amazon S3 Standard-Infrequent Access (IA) Amazon S3 One Zone-Infrequent Access Amazon S3 Glacier Instant Retrieval Amazon S3 Glacier Flexible Retrieval Amazon S3 Glacier Deep Archive Amazon S3 Intelligent Tiering Can move between classes manually or using S3 Lifecycle configurations © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com S3 Durability and Availability Durability: High durability (99.999999999%, 11 9’s) of objects across multiple AZ If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years Same for all storage classes Availability: Measures how readily available a service is Varies depending on storage class Example: S3 standard has 99.99% availability = not available 53 minutes a year © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com S3 Standard – General Purpose 99.99% Availability Used for frequently accessed data Low latency and high throughput Sustain 2 concurrent facility failures Use Cases: Big Data analytics, mobile & gaming applications, content distribution… © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com S3 Storage Classes – Infrequent Access For data that is less frequently accessed, but requires rapid access when needed Lower cost than S3 Standard Amazon S3 Standard-Infrequent Access (S3 Standard-IA) 99.9% Availability Use cases: Disaster Recovery, backups Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA) High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed 99.5% Availability Use Cases: Storing secondary backup copies of on-premise data, or data you can recreate © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 Glacier Storage Classes Low-cost object storage meant for archiving / backup Pricing: price for storage + object retrieval cost Amazon S3 Glacier Instant Retrieval Millisecond retrieval, great for data accessed once a quarter Minimum storage duration of 90 days Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier): Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free Minimum storage duration of 90 days Amazon S3 Glacier Deep Archive – for long term storage: Standard (12 hours), Bulk (48 hours) Minimum storage duration of 180 days © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com S3 Intelligent-Tiering Small monthly monitoring and auto-tiering fee Moves objects automatically between Access Tiers based on usage There are no retrieval charges in S3 Intelligent-Tiering Frequent Access tier (automatic): default tier Infrequent Access tier (automatic): objects not accessed for 30 days Archive Instant Access tier (automatic): objects not accessed for 90 days Archive Access tier (optional): configurable from 90 days to 700+ days Deep Archive Access tier (optional): config. from 180 days to 700+ days © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com S3 Storage Classes Comparison Intelligent- Glacier Instant Glacier Flexible Glacier Deep Standard Standard-IA One Zone-IA Tiering Retrieval Retrieval Archive Durability 99.999999999% == (11 9’s) Availability 99.99% 99.9% 99.9% 99.5% 99.9% 99.99% 99.99% Availability SLA 99.9% 99% 99% 99% 99% 99.9% 99.9% Availability >= 3 >= 3 >= 3 1 >= 3 >= 3 >= 3 Zones Min. Storage None None 30 Days 30 Days 90 Days 90 Days 180 Days Duration Charge Min. Billable None None 128 KB 128 KB 128 KB 40 KB 40 KB Object Size Retrieval Fee None None Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved https://aws.amazon.com/s3/storage-classes/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com S3 Storage Classes – Price Comparison Example: us-east-1 Glacier Instant Glacier Flexible Glacier Deep Standard Intelligent-Tiering Standard-IA One Zone-IA Retrieval Retrieval Archive Storage Cost $0.023 $0.0025 - $0.023 $0.0125 $0.01 $0.004 $0.0036 $0.00099 (per GB per month) GET: $0.0004 GET: $0.0004 POST: $0.03 POST: $0.05 Retrieval Cost GET: $0.0004 GET: $0.0004 GET: $0.001 GET: $0.001 GET: $0.01 (per 1000 request) POST: $0.005 POST: $0.005 POST: $0.01 POST: $0.01 POST: $0.02 Expedited: $10 Standard: $0.10 Standard: $0.05 Bulk: $0.025 Bulk: free Expedited (1 – 5 mins) Standard (12 hours) Retrieval Time Instantaneous Standard (3 – 5 hours) Bulk (48 hours) Bulk (5 – 12 hours) Monitoring Cost $0.0025 (pet 1000 objects) https://aws.amazon.com/s3/pricing/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com S3 Encryption Server-Side Encryption Client-Side Encryption (Default) User User Encrypts the file upload upload Before uploading it Server encrypts the file after receiving it Bucket Bucket Amazon S3 Amazon S3 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com IAM Access Analyzer for S3 Ensures that only intended people have access to your S3 buckets Example: publicly accessible bucket, bucket shared with other AWS account… Evaluates S3 Bucket Policies, S3 ACLs, S3 Access Point Policies Powered by IAM Access Analyzer © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Shared Responsibility Model for S3 Infrastructure (global security, S3 Versioning durability, availability, sustain S3 Bucket Policies concurrent loss of data in S3 Replication Setup two facilities) Logging and Monitoring Configuration and S3 Storage Classes vulnerability analysis Data encryption at rest and in Compliance validation transit © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Snow Family Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS Data migration: Snowcone Snowball Edge Snowmobile Edge computing: Snowcone Snowball Edge © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Data Migrations with AWS Snow Family Challenges: Time to Transfer Limited connectivity 100 Mbps 1Gbps 10Gbps Limited bandwidth 10 TB 12 days 30 hours 3 hours High network cost 100 TB 124 days 12 days 30 hours Shared bandwidth (can’t 1 PB 3 years 124 days 12 days maximize the line) Connection stability AWS Snow Family: offline devices to perform data migrations If it takes more than a week to transfer over the network, use Snowball devices! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Diagrams Direct upload to S3: www: 10Gbit/s client Amazon S3 bucket With Snow Family: ship AWS AWS import/ Amazon S3 client Snowball Snowball export bucket © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Snowball Edge (for data transfers) Physical data transport solution: move TBs or PBs of data in or out of AWS Alternative to moving data over the network (and paying network fees) Pay per data transfer job Provide block storage and Amazon S3-compatible object storage Snowball Edge Storage Optimized 80 TB of HDD or 210TB NVMe capacity for block volume and S3 compatible object storage Snowball Edge Compute Optimized 42 TB of HDD or 28TB NVMe capacity for block volume and S3 compatible object storage Use cases: large data cloud migrations, DC decommission, disaster recovery © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Snowcone & Snowcone SSD Small, por table computing, anywhere, rugged & secure, withstands harsh environments Light (4.5 pounds, 2.1 kg) Device used for edge computing, storage, and data transfer Snowcone – 8 TB of HDD Storage Snowcone SSD – 14 TB of SSD Storage Use Snowcone where Snowball does not fit (space- constrained environment) Must provide your own battery / cables Can be sent back to AWS offline, or connect it to internet and use AWS DataSync to send data © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Snowmobile Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs) Each Snowmobile has 100 PB of capacity (use multiple in parallel) High security: temperature controlled, GPS, 24/7 video surveillance Better than Snowball if you transfer more than 10 PB © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Snow Family for Data Migrations Snowcone Snowball Edge Snowmobile Snowcone & Snowball Edge Snowmobile Snowcone SSD Storage Optimized 8 TB HDD 80 TB - 210 TB < 100 PB Storage Capacity 14 TB SSD Up to 24 TB, online and Up to petabytes, Up to exabytes, offline Migration Size offline offline DataSync agent Pre-installed © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Snow Family – Usage Process 1. Request Snowball devices from the AWS console for delivery 2. Install the snowball client / AWS OpsHub on your servers 3. Connect the snowball to your servers and copy files using the client 4. Ship back the device when you’re done (goes to the right AWS facility) 5. Data will be loaded into an S3 bucket 6. Snowball is completely wiped © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What is Edge Computing? Process data while it’s being created on an edge location A truck on the road, a ship on the sea, a mining station underground... These locations may have Limited / no internet access Limited / no easy access to computing power We setup a Snowball Edge / Snowcone device to do edge computing Use cases of Edge Computing: Preprocess data Machine learning at the edge Transcoding media streams Eventually (if need be) we can ship back the device to AWS (for transferring data for example) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Snow Family – Edge Computing Snowcone & Snowcone SSD (smaller) 2 CPUs, 4 GB of memory, wired or wireless access USB-C power using a cord or the optional battery Snowball Edge – Compute Optimized 104 vCPUs, 416 GiB of RAM Optional GPU (useful for video processing or machine learning) 28 TB NVMe or 42TB HDD usable storage Storage Clustering available (up to 16 nodes) Snowball Edge – Storage Optimized Up to 40 vCPUs, 80 GiB of RAM, 80 TB storage Up to 104 vCPUs, 416 GiB of RAM, 210 TB NVMe storage All: Can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass) Long-term deployment options: 1 and 3 years discounted pricing © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS OpsHub Historically, to use Snow Family devices, you needed a CLI (Command Line Interface tool) Today, you can use AWS OpsHub (a software you install on your computer / laptop) to manage your Snow Family Device Unlocking and configuring single or clustered devices Transferring files Launching and managing instances running on Snow Family Devices Monitor device metrics (storage capacity, active instances on your device) Launch compatible AWS services on your devices (ex: Amazon EC2 instances, AWS DataSync, Network File System (NFS)) https://aws.amazon.com/blogs/aws/aws-snowball-edge-update/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Snowball Edge Pricing You pay for device usage and data transfer out of AWS Data transfer IN to Amazon S3 is $0.00 per GB On-Demand Includes a one-time service fee per job, which includes: 10 days of usage for Snowball Edge Storage Optimized 80TB 15 days of usage for Snowball Edge Storage Optimized 210TB Shipping days are NOT counted towards the included 10 or 15 days Pay per day for any additional days Committed Upfront Pay in advance for monthly, 1-year, and 3-years of usage (Edge Computing) Up to 62% discounted pricing © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Hybrid Cloud for Storage AWS is pushing for ”hybrid cloud” Part of your infrastructure is on-premises Part of your infrastructure is on the cloud This can be due to Long cloud migrations Security requirements Compliance requirements IT strategy S3 is a proprietary storage technology (unlike EFS / NFS), so how do you expose the S3 data on-premise? AWS Storage Gateway! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Storage Cloud Native Options BLOCK FILE OBJECT Amazon EBS EC2 Instance Amazon EFS Amazon S3 Glacier Store © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Storage Gateway Bridge between on-premise data and cloud data in S3 Tapes Files Volumes Hybrid storage service to allow on- premises to seamlessly use the AWS Cloud Use cases: disaster recovery, backup & restore, tiered storage AWS Storage Gateway Types of Storage Gateway: File Gateway Volume Gateway Tape Gateway No need to know the types at the exam Amazon EBS S3 Glacier © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon S3 – Summary Buckets vs Objects: global unique name, tied to a region S3 security: IAM policy, S3 Bucket Policy (public access), S3 Encryption S3 Websites: host a static website on Amazon S3 S3 Versioning: multiple versions for files, prevent accidental deletes S3 Replication: same-region or cross-region, must enable versioning S3 Storage Classes: Standard, IA, 1Z-IA, Intelligent, Glacier (Instant, Flexible, Deep) Snow Family: import data onto S3 through a physical device, edge computing OpsHub: desktop application to manage Snow Family devices Storage Gateway: hybrid solution to extend on-premises storage to S3 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Databases Section © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Databases Intro Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits Sometimes, you want to store data in a database… You can structure the data You build indexes to efficiently query / search through the data You define relationships between your datasets Databases are optimized for a purpose and come with different features, shapes and constraints © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Relational Databases Looks just like Excel spreadsheets, with links between them! Can use the SQL language to perform queries / lookups Students Subjects Student ID Dept ID Name Email Student ID Subject 1 M01 Joe Miller [email protected] 1 Physics 2 B01 Sarah T [email protected] 1 Chemistry 1 Math Departments 2 History Dept ID SPOC Email Phone 2 Geography M01 Kelly Jones [email protected] +1234567890 2 Economics B01 Satish Kumar [email protected] +1234567891 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com NoSQL Databases NoSQL = non-SQL = non relational databases NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications. Benefits: Flexibility: easy to evolve data model Scalability: designed to scale-out by using distributed clusters High-performance: optimized for a specific data model Highly functional: types optimized for the data model Examples: Key-value, document, graph, in-memory, search databases © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com NoSQL data example: JSON JSON = JavaScript Object Notation { JSON is a common form of data "name": "John", "age": 30, that fits into a NoSQL model "cars": [ "Ford", Data can be nested "BMW", "Fiat" Fields can change over time ], Support for new types: arrays, etc… "address": { "type": "house", "number": 23, "street": "Dream Road" } } © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Databases & Shared Responsibility on AWS AWS offers use to manage different databases Benefits include: Quick Provisioning, High Availability, Vertical and Horizontal Scaling Automated Backup & Restore, Operations, Upgrades Operating System Patching is handled by AWS Monitoring, alerting Note: many databases technologies could be run on EC2, but you must handle yourself the resiliency, backup, patching, high availability, fault tolerance, scaling… © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon RDS Overview RDS stands for Relational Database Service It’s a managed DB service for DB use SQL as a query language. It allows you to create databases in the cloud that are managed by AWS Postgres MySQL MariaDB Oracle Microsoft SQL Server IBM DB2 Aurora (AWS Proprietary database) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Advantage over using RDS versus deploying DB on EC2 RDS is a managed service: Automated provisioning, OS patching Continuous backups and restore to specific timestamp (Point in Time Restore)! Monitoring dashboards Read replicas for improved read performance Multi AZ setup for DR (Disaster Recovery) Maintenance windows for upgrades Scaling capability (vertical and horizontal) Storage backed by EBS BUT you can’t SSH into your instances © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com RDS Solution Architecture Read/write Elastic Load Balancer SQL (relational) Database EC2 Instances Possibly in an ASG © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Aurora Aurora is a proprietary technology from AWS (not open sourced) PostgreSQL and MySQL are both supported as Aurora DB Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS Aurora storage automatically grows in increments of 10GB, up to 128 TB Aurora costs more than RDS (20% more) – but is more efficient Not in the free tier © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Aurora Serverless Client Automated database instantiation and auto-scaling based on actual usage PostgreSQL and MySQL are both supported as Aurora Serverless DB No capacity planning needed Proxy Fleet Least management overhead (managed by Aurora) Pay per second, can be more cost- effective Use cases: good for infrequent, intermittent or unpredictable workloads… Shared storage Volume © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com RDS Deployments: Read Replicas, Multi-AZ Read Replicas: Multi-AZ: Scale the read workload of your DB Failover in case of AZ outage (high availability) Can create up to 15 Read Replicas Data is only read/written to the main database Data is only written to the main DB Can only have 1 other AZ as failover replication Replication cross AZ replication Read Replica Main Read Replica Main Failover DB read writes read read writes read Application(s) Application(s) Failover in case of issues with Main DB © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com RDS Deployments: Multi-Region Multi-Region (Read Replicas) Disaster recovery in case of region issue Local performance for global reads Replication cost us-east-2 eu-west-1 ap-southeast-2 replication replication Read Replica writes Main writes Read Replica read writes read read Application(s) Application(s) Application(s) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon ElastiCache Overview The same way RDS is to get managed Relational Databases… ElastiCache is to get managed Redis or Memcached Caches are in-memory databases with high performance, low latency Helps reduce load off databases for read intensive workloads AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com ElastiCache Solution Architecture - Cache ElastiCache In-memory database Read / write from cache Fast EC2 Instances Possibly in an ASG Elastic Load Balancer Read / write From DB Slower SQL (relational) Database © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DynamoDB Fully Managed Highly available with replication across 3 AZ NoSQL database - not a relational database Scales to massive workloads, distributed “serverless” database Millions of requests per seconds, trillions of row, 100s of TB of storage Fast and consistent in performance Single-digit millisecond latency – low latency retrieval Integrated with IAM for security, authorization and administration Low cost and auto scaling capabilities Standard & Infrequent Access (IA) Table Class © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DynamoDB – type of data DynamoDB is a key/value database https://aws.amazon.com/nosql/key-value/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DynamoDB Accelerator - DAX Fully Managed in-memory cache for DynamoDB applications 10x performance improvement – single- digit millisecond latency to microseconds latency – when accessing your DynamoDB DAX tables DynamoDB Accelerator Secure, highly scalable & highly available Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases Amazon table table table DynamoDB © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DynamoDB – Global Tables Make a DynamoDB table accessible with low latency in multiple-regions Active-Active replication (read/write to any AWS Region) Users read/write N. Virginia (us-east-1) Paris (eu-west-3) DynamoDB DynamoDB 2-way replication Global Table Global Table © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Redshift Overview Redshift is based on PostgreSQL, but it’s not used for OLTP It’s OLAP – online analytical processing (analytics and data warehousing) Load data once every hour, not every second 10x better performance than other data warehouses, scale to PBs of data Columnar storage of data (instead of row based) Massively Parallel Query Execution (MPP), highly available Pay as you go based on the instances provisioned Has a SQL interface for performing the queries BI tools such as AWS Quicksight or Tableau integrate with it © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Redshift Serverless Automatically provisions and scales data warehouse underlying capacity Run analytics workloads without managing data warehouse infrastructure Pay only for what you use (save costs) Use cases: Reporting, dashboarding applications, real-time analytics… Enable Amazon Redshift Connect using Amazon Redshift Amazon Redshift Serverless Pay only for compute and Serverless for your AWS Account Query Editor or any other tool run queries by automatically storage used during analysis provision and scale capacity based on workloads © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon EMR EMR stands for “Elastic MapReduce” EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data The clusters can be made of hundreds of EC2 instances Also supports Apache Spark, HBase, Presto, Flink… EMR takes care of all the provisioning and configuration Auto-scaling and integrated with Spot instances Use cases: data processing, machine learning, web indexing, big data… © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Athena Serverless query service to analyze data stored in Amazon S3 Uses standard SQL language to query the files load data Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto) Pricing: $5.00 per TB of data scanned Use compressed or columnar data for cost-savings (less scan) S3 Bucket Query & Analyze Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc... Amazon Athena Exam Tip: analyze data in S3 using serverless SQL, use Athena Reporting & Dashboards Amazon QuickSight © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon QuickSight Serverless machine learning-powered business intelligence service to create interactive dashboards Fast, automatically scalable, embeddable, with per-session pricing Use cases: Business analytics Building visualizations Perform ad-hoc analysis Get business insights using data Integrated with RDS, Aurora, Athena, Redshift, S3… https://aws.amazon.com/quicksight/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DocumentDB Aurora is an “AWS-implementation” of PostgreSQL / MySQL … DocumentDB is the same for MongoDB (which is a NoSQL database) MongoDB is used to store, query, and index JSON data Similar “deployment concepts” as Aurora Fully Managed, highly available with replication across 3 AZ DocumentDB storage automatically grows in increments of 10GB Automatically scales to workloads with millions of requests per seconds © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Neptune Fully managed graph database A popular graph dataset would be a social network Users have friends Posts have comments Comments have likes from users Users share and like posts… Highly available across 3 AZ, with up to 15 read replicas Build and run applications working with highly connected datasets – optimized for these complex and hard queries Can store up to billions of relations and query the graph with milliseconds latency Highly available with replications across multiple AZs Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Timestream Fully managed, fast, scalable, serverless time series database Automatically scales up/down to adjust capacity Store and analyze trillions of events per day 1000s times faster & 1/10th the cost of relational databases Built-in time series analytics functions (helps you identify patterns in your data in near real-time) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon QLDB QLDB stands for ”Quantum Ledger Database” A ledger is a book recording financial transactions Fully Managed, Serverless, High available, Replication across 3 AZ Used to review history of all the changes made to your application data over time Immutable system: no entry can be removed or modified, cryptographically verifiable 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules https://docs.aws.amazon.com/qldb/latest/developerguide/ledger-structure.html © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Managed Blockchain Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority. Amazon Managed Blockchain is a managed service to: Join public blockchain networks Or create your own scalable private network Compatible with the frameworks Hyperledger Fabric & Ethereum © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Glue Managed extract, transform, and load (ETL) service Useful to prepare and transform data for analytics Fully serverless service Glue ETL S3 Bucket Extract Load Amazon RDS Transform Redshift Glue Data Catalog: catalog of datasets can be used by Athena, Redshift, EMR © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DMS – Database Migration Service Quickly and securely migrate databases Source DB to AWS, resilient, self healing The source database remains available during the migration EC2 instance Running DMS Supports: Homogeneous migrations: ex Oracle to Oracle Target DB Heterogeneous migrations: ex Microsoft SQL Server to Aurora © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Databases & Analytics Summary in AWS Relational Databases - OLTP: RDS & Aurora (SQL) Differences between Multi-AZ, Read Replicas, Multi-Region In-memory Database: ElastiCache Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB) Warehouse - OLAP: Redshift (SQL) Hadoop Cluster: EMR Athena: query data on Amazon S3 (serverless & SQL) QuickSight: dashboards on your data (serverless) DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database) Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable) Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains Glue: Managed ETL (Extract Transform Load) and Data Catalog service Database Migration: DMS Neptune: graph database Timestream: time-series database © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Other Compute Section © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What is Docker? Docker is a software development platform to deploy apps Apps are packaged in containers that can be run on any OS Apps run the same, regardless of where they’re run Any machine No compatibility issues Predictable behavior Less work Easier to maintain and deploy Works with any language, any OS, any technology Scale containers up and down very quickly (seconds) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Server (ex: EC2 Instance) Docker on an OS © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Where Docker images are stored? Docker images are stored in Docker Repositories Public: Docker Hub https://hub.docker.com/ Find base images for many technologies or OS: Ubuntu MySQL NodeJS, Java… Private: Amazon ECR (Elastic Container Registry) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Docker versus Virtual Machines Docker is ”sort of ” a virtualization technology, but not exactly Resources are shared with the host => many containers on one server Apps Apps Apps Container Container Container Guest OS Guest OS Guest OS Container Container Container (VM) (VM) (VM) Container Container Container Hypervisor Docker Daemon Host OS Host OS (EC2 Instance) Infrastructure Infrastructure © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com ECS ECS Service ECS = Elastic Container Service Launch Docker containers on AWS New Docker Container You must provision & maintain the infrastructure (the EC2 instances) EC2 instance EC2 instance EC2 instance AWS takes care of starting / stopping containers Has integrations with the Application Load Balancer © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Fargate Launch Docker containers on New Docker Container AWS You do not provision the infrastructure (no EC2 Fargate instances to manage) – simpler! Serverless offering AWS just runs containers for you based on the CPU / RAM you need © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com ECR Elastic Container Registry Private Docker Registry on ECR Fargate AWS This is where you store your Image 1 Docker images so they can be run by ECS or Fargate Image 2 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What’s serverless? Serverless is a new paradigm in which the developers don’t have to manage servers anymore… They just deploy code They just deploy… functions ! Initially... Serverless == FaaS (Function as a Service) Serverless was pioneered by AWS Lambda but now also includes anything that’s managed: “databases, messaging, storage, etc.” Serverless does not mean there are no servers… it means you just don’t manage / provision / see them © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com So far in this course… Amazon S3 DynamoDB Fargate Lambda © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Why AWS Lambda Virtual Servers in the Cloud Limited by RAM and CPU Continuously running Amazon EC2 Scaling means intervention to add / remove servers Virtual functions – no servers to manage! Limited by time - shor t executions Run on-demand Amazon Lambda Scaling is automated! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Benefits of AWS Lambda Easy Pricing: Pay per request and compute time Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time Integrated with the whole AWS suite of services Event-Driven: functions get invoked by AWS when needed Integrated with many programming languages Easy monitoring through AWS CloudWatch Easy to get more resources per functions (up to 10GB of RAM!) Increasing RAM will also improve CPU and network! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Lambda language support Node.js (JavaScript) Python Java (Java 8 compatible) C# (.NET Core) Golang C# / Powershell Ruby Custom Runtime API (community supported, example Rust) Lambda Container Image The container image must implement the Lambda Runtime API ECS / Fargate is preferred for running arbitrary Docker images © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Example: Serverless Thumbnail creation u sh p New thumbnail in S3 trigger pu Image name sh New image in S3 AWS Lambda Function Image size Creates a Thumbnail Creation date etc… Metadata in DynamoDB © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Example: Serverless CRON Job Trigger Every 1 hour CloudWatch Events EventBridge AWS Lambda Function Perform a task © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Lambda Pricing: example You can find overall pricing information here: https://aws.amazon.com/lambda/pricing/ Pay per calls: First 1,000,000 requests are free $0.20 per 1 million requests thereafter ($0.0000002 per request) Pay per duration: (in increment of 1 ms) 400,000 GB-seconds of compute time per month for FREE == 400,000 seconds if function is 1GB RAM == 3,200,000 seconds if function is 128 MB RAM After that $1.00 for 600,000 GB-seconds It is usually very cheap to run AWS Lambda so it’s very popular © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon API Gateway Example: building a serverless API REST API PROXY REQUESTS CRUD Client API Gateway Lambda DynamoDB Fully managed service for developers to easily create, publish, maintain, monitor, and secure APIs Serverless and scalable Supports RESTful APIs and WebSocket APIs Support for security, user authentication, API throttling, API keys, monitoring... © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Batch Fully managed batch processing at any scale Efficiently run 100,000s of computing batch jobs on AWS A “batch” job is a job with a start and an end (opposed to continuous) Batch will dynamically launch EC2 instances or Spot Instances AWS Batch provisions the right amount of compute / memory You submit or schedule batch jobs and AWS Batch does the rest! Batch jobs are defined as Docker images and run on ECS Helpful for cost optimizations and focusing less on the infrastructure © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Batch – Simplified Example AWS Batch EC2 Instance ECS Insert Amazon S3 Trigger processed object Spot Instance Amazon S3 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Batch vs Lambda Lambda: Time limit Limited runtimes Limited temporary disk space Serverless Batch: No time limit Any runtime as long as it’s packaged as a Docker image Rely on EBS / instance store for disk space Relies on EC2 (can be managed by AWS) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Lightsail Virtual servers, storage, databases, and networking Low & predictable pricing Simpler alternative to using EC2, RDS, ELB, EBS, Route 53… Great for people with little cloud experience! Can setup notifications and monitoring of your Lightsail resources Use cases: Simple web applications (has templates for LAMP, Nginx, MEAN, Node.js…) Websites (templates for WordPress, Magento, Plesk, Joomla) Dev / Test environment Has high availability but no auto-scaling, limited AWS integrations © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Other Compute - Summary Docker : container technology to run applications ECS: run Docker containers on EC2 instances Fargate: Run Docker containers without provisioning the infrastructure Serverless offering (no EC2 instances) ECR: Private Docker Images Repository Batch: run batch jobs on AWS across managed EC2 instances Lightsail: predictable & low pricing for simple application & DB stacks © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Lambda Summary Lambda is Serverless, Function as a Service, seamless scaling, reactive Lambda Billing: By the time run x by the RAM provisioned By the number of invocations Language Suppor t: many programming languages except (arbitrary) Docker Invocation time: up to 15 minutes Use cases: Create Thumbnails for images uploaded onto S3 Run a Serverless cron job API Gateway: expose Lambda functions as HTTP API © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Deploying and Managing Infrastructure at Scale Section © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What is CloudFormation CloudFormation is a declarative way of outlining your AWS Infrastructure, for any resources (most of them are supported). For example, within a CloudFormation template, you say: I want a security group I want two EC2 instances using this security group I want an S3 bucket I want a load balancer (ELB) in front of these machines Then CloudFormation creates those for you, in the right order, with the exact configuration that you specify © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Benefits of AWS CloudFormation (1/2) Infrastructure as code No resources are manually created, which is excellent for control Changes to the infrastructure are reviewed through code Cost Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you You can estimate the costs of your resources using the CloudFormation template Savings strategy: In Dev, you could automation deletion of templates at 5 PM and recreated at 8 AM, safely © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Benefits of AWS CloudFormation (2/2) Productivity Ability to destroy and re-create an infrastructure on the cloud on the fly Automated generation of Diagram for your templates! Declarative programming (no need to figure out ordering and orchestration) Don’t re-invent the wheel Leverage existing templates on the web! Leverage the documentation Suppor ts (almost) all AWS resources: Everything we’ll see in this course is supported You can use “custom resources” for resources that are not supported © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com CloudFormation + Application Composer Example: WordPress CloudFormation Stack + CloudFormation Application We can see all the resources Composer We can see the relations between the components © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Cloud Development Kit (CDK) Define your cloud infrastructure using a familiar language: JavaScript/TypeScript, Python, Java, and.NET The code is “compiled” into a CloudFormation template (JSON/YAML) You can therefore deploy infrastructure and application runtime code together Great for Lambda functions Great for Docker containers in ECS / EKS CDK Application CDK CLI CloudFormation CloudFormation Programming Template Languages © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com CDK Example © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Typical architecture: Web App 3-tier Auto Scaling group ElastiCache Availability zone 1 Multi AZ Store / retrieve Availability zone 2 session data + Cached data ELB Availability zone 3 Amazon RDS Read / write data © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Developer problems on AWS Managing infrastructure Deploying Code Configuring all the databases, load balancers, etc Scaling concerns Most web apps have the same architecture (ALB + ASG) All the developers want is for their code to run! Possibly, consistently across different applications and environments © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Elastic Beanstalk Overview Elastic Beanstalk is a developer centric view of deploying an application on AWS It uses all the component’s we’ve seen before: EC2, ASG, ELB, RDS, etc… But it’s all in one view that’s easy to make sense of! We still have full control over the configuration Beanstalk = Platform as a Service (PaaS) Beanstalk is free but you pay for the underlying instances © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Elastic Beanstalk Managed service Instance configuration / OS is handled by Beanstalk Deployment strategy is configurable but performed by Elastic Beanstalk Capacity provisioning Load balancing & auto-scaling Application health-monitoring & responsiveness Just the application code is the responsibility of the developer Three architecture models: Single Instance deployment: good for dev LB + ASG: great for production or pre-production web applications ASG only: great for non-web apps in production (workers, etc..) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Elastic Beanstalk Support for many platforms: Single Container Docker Go Multi-Container Docker Java SE Preconfigured Docker Java with Tomcat.NET on Windows Server with IIS Node.js PHP Python Ruby Packer Builder © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Elastic Beanstalk – Health Monitoring Health agent pushes metrics to CloudWatch Checks for app health, publishes health events © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeDeploy We want to deploy our application EC2 Instances being upgraded automatically v1 v2 Works with EC2 Instances v1 v2 Works with On-Premises Servers Hybrid service v1 v2 Servers / Instances must be provisioned On-premises Servers being upgraded and configured ahead of time with the CodeDeploy Agent © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeCommit Before pushing the application code to servers, it needs to be stored somewhere Developers usually store code in a repository, using the Git technology A famous public offering is GitHub, AWS’ competing product is CodeCommit CodeCommit: Source-control service that hosts Git-based repositories Makes it easy to collaborate with others on code The code changes are automatically versioned Benefits: Fully managed Scalable & highly available Private, Secured, Integrated with AWS © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeBuild Code building service in the cloud (name is obvious) Compiles source code, run tests, and produces packages that are ready to be deployed (by CodeDeploy for example) Retrieve code Build code Ready-to-deploy artifact CodeCommit CodeBuild Benefits: Fully managed, serverless Continuously scalable & highly available Secure Pay-as-you-go pricing – only pay for the build time © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodePipeline Orchestrate the different steps to have the code automatically pushed to production Code => Build => Test => Provision => Deploy Basis for CICD (Continuous Integration & Continuous Delivery) Benefits: Fully managed, compatible with CodeCommit, CodeBuild, CodeDeploy, Elastic Beanstalk, CloudFormation, GitHub, 3rd-party services (GitHub…) & custom plugins… Fast delivery & rapid updates CodePipeline: orchestration layer CodeCommit CodeBuild CodeDeploy Elastic Beanstalk © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeArtifact Software packages depend on each other to be built (also called code dependencies), and new ones are created Storing and retrieving these dependencies is called ar tifact management Traditionally you need to setup your own artifact management system CodeAr tifact is a secure, scalable, and cost-effective ar tifact management for software development Works with common dependency management tools such as Maven, Gradle, npm, yarn, twine, pip, and NuGet Developers and CodeBuild can then retrieve dependencies straight from CodeAr tifact © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeStar Unified UI to easily manage software development activities in one place “Quick way” to get started to correctly set-up CodeCommit, CodePipeline, CodeBuild, CodeDeploy, Elastic Beanstalk, EC2, etc… Can edit the code ”in-the-cloud” using AWS Cloud9 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Cloud9 AWS Cloud9 is a cloud IDE (Integrated Development Environment) for writing, running and debugging code “Classic” IDE (like IntelliJ, Visual Studio Code…) are downloaded on a computer before being used A cloud IDE can be used within a web browser, meaning you can work on your projects from your office, home, or anywhere with internet with no setup necessary AWS Cloud9 also allows for code collaboration in real-time (pair programming) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Systems Manager (SSM) Helps you manage your EC2 and On-Premises systems at scale Another Hybrid AWS service Get operational insights about the state of your infrastructure Suite of 10+ products Most important features are: Patching automation for enhanced compliance Run commands across an entire fleet of servers Store parameter configuration with the SSM Parameter Store Works for Linux, Windows, MacOS, and Raspberry Pi OS (Raspbian) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com How Systems Manager works We need to install the SSM agent onto the systems we SSM control Installed by default on Amazon Linux AMI & some Ubuntu AMI If an instance can’t be controlled with SSM, it’s probably an issue with the SSM agent! SSM Agent SSM Agent SSM Agent Thanks to the SSM agent, we can run commands, patch & configure our servers EC2 Instance EC2 Instance On Premise VM © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Systems Manager – SSM Session Manager Allows you to start a secure shell on your EC2 and EC2 Instance on-premises servers (SSM Agent) Execute commands No SSH access, bastion hosts, or SSH keys needed Session Manager No por t 22 needed (better security) Supports Linux, macOS, and Windows IAM Permissions Send session log data to S3 or Cloud