🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

Database section and other compute section .pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Databases Section © Stephane Maarek NOT FOR DISTRIBUTION © Step...

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Databases Section © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Databases Intro Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits Sometimes, you want to store data in a database… You can structure the data You build indexes to efficiently query / search through the data You define relationships between your datasets Databases are optimized for a purpose and come with different features, shapes and constraints © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Relational Databases Looks just like Excel spreadsheets, with links between them! Can use the SQL language to perform queries / lookups Students Subjects Student ID Dept ID Name Email Student ID Subject 1 M01 Joe Miller [email protected] 1 Physics 2 B01 Sarah T [email protected] 1 Chemistry 1 Math Departments 2 History Dept ID SPOC Email Phone 2 Geography M01 Kelly Jones [email protected] +1234567890 2 Economics B01 Satish Kumar [email protected] +1234567891 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com NoSQL Databases NoSQL = non-SQL = non relational databases NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications. Benefits: Flexibility: easy to evolve data model Scalability: designed to scale-out by using distributed clusters High-performance: optimized for a specific data model Highly functional: types optimized for the data model Examples: Key-value, document, graph, in-memory, search databases © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com NoSQL data example: JSON JSON = JavaScript Object Notation { JSON is a common form of data "name": "John", "age": 30, that fits into a NoSQL model "cars": [ "Ford", Data can be nested "BMW", "Fiat" Fields can change over time ], Support for new types: arrays, etc… "address": { "type": "house", "number": 23, "street": "Dream Road" } } © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Databases & Shared Responsibility on AWS AWS offers use to manage different databases Benefits include: Quick Provisioning, High Availability, Vertical and Horizontal Scaling Automated Backup & Restore, Operations, Upgrades Operating System Patching is handled by AWS Monitoring, alerting Note: many databases technologies could be run on EC2, but you must handle yourself the resiliency, backup, patching, high availability, fault tolerance, scaling… © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS RDS Overview RDS stands for Relational Database Service It’s a managed DB service for DB use SQL as a query language. It allows you to create databases in the cloud that are managed by AWS Postgres MySQL MariaDB Oracle Microsoft SQL Server Aurora (AWS Proprietary database) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Advantage over using RDS versus deploying DB on EC2 RDS is a managed service: Automated provisioning, OS patching Continuous backups and restore to specific timestamp (Point in Time Restore)! Monitoring dashboards Read replicas for improved read performance Multi AZ setup for DR (Disaster Recovery) Maintenance windows for upgrades Scaling capability (vertical and horizontal) Storage backed by EBS (gp2 or io1) BUT you can’t SSH into your instances © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com RDS Solution Architecture Read/write Elastic Load Balancer SQL (relational) Database EC2 Instances Possibly in an ASG © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Aurora Aurora is a proprietary technology from AWS (not open sourced) PostgreSQL and MySQL are both supported as Aurora DB Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS Aurora storage automatically grows in increments of 10GB, up to 64 TB. Aurora costs more than RDS (20% more) – but is more efficient Not in the free tier © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com RDS Deployments: Read Replicas, Multi-AZ Read Replicas: Multi-AZ: Scale the read workload of your DB Failover in case of AZ outage (high availability) Can create up to 5 Read Replicas Data is only read/written to the main database Data is only written to the main DB Can only have 1 other AZ as failover replication Replication cross AZ replication Read Replica Main Read Replica Main Failover DB read writes read read writes read Application(s) Application(s) Failover in case of issues with Main DB © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com RDS Deployments: Multi-Region Multi-Region (Read Replicas) Disaster recovery in case of region issue Local performance for global reads Replication cost us-east-2 eu-west-1 ap-southeast-2 replication replication Read Replica writes Main writes Read Replica read writes read read Application(s) Application(s) Application(s) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon ElastiCache Overview The same way RDS is to get managed Relational Databases… ElastiCache is to get managed Redis or Memcached Caches are in-memory databases with high performance, low latency Helps reduce load off databases for read intensive workloads AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com ElastiCache Solution Architecture - Cache ElastiCache In-memory database Read / write from cache Fast EC2 Instances Possibly in an ASG Elastic Load Balancer Read / write From DB Slower SQL (relational) Database © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DynamoDB Fully Managed Highly available with replication across 3 AZ NoSQL database - not a relational database Scales to massive workloads, distributed “serverless” database Millions of requests per seconds, trillions of row, 100s of TB of storage Fast and consistent in performance Single-digit millisecond latency – low latency retrieval Integrated with IAM for security, authorization and administration Low cost and auto scaling capabilities Standard & Infrequent Access (IA) Table Class © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DynamoDB – type of data DynamoDB is a key/value database https://aws.amazon.com/nosql/key-value/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DynamoDB Accelerator - DAX Fully Managed in-memory cache for DynamoDB applications 10x performance improvement – single- digit millisecond latency to microseconds latency – when accessing your DynamoDB DAX tables DynamoDB Accelerator Secure, highly scalable & highly available Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases Amazon table table table DynamoDB © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DynamoDB – Global Tables Make a DynamoDB table accessible with low latency in multiple-regions Active-Active replication (read/write to any AWS Region) Users read/write N. Virginia (us-east-1) Paris (eu-west-3) DynamoDB DynamoDB 2-way replication Global Table Global Table © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Redshift Overview Redshift is based on PostgreSQL, but it’s not used for OLTP It’s OLAP – online analytical processing (analytics and data warehousing) Load data once every hour, not every second 10x better performance than other data warehouses, scale to PBs of data Columnar storage of data (instead of row based) Massively Parallel Query Execution (MPP), highly available Pay as you go based on the instances provisioned Has a SQL interface for performing the queries BI tools such as AWS Quicksight or Tableau integrate with it © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon EMR EMR stands for “Elastic MapReduce” EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data The clusters can be made of hundreds of EC2 instances Also supports Apache Spark, HBase, Presto, Flink… EMR takes care of all the provisioning and configuration Auto-scaling and integrated with Spot instances Use cases: data processing, machine learning, web indexing, big data… © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Athena Serverless query service to analyze data stored in Amazon S3 Uses standard SQL language to query the files load data Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto) Pricing: $5.00 per TB of data scanned Use compressed or columnar data for cost-savings (less scan) S3 Bucket Query & Analyze Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc... Amazon Athena Exam Tip: analyze data in S3 using serverless SQL, use Athena Reporting & Dashboards Amazon QuickSight © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon QuickSight Serverless machine learning-powered business intelligence service to create interactive dashboards Fast, automatically scalable, embeddable, with per-session pricing Use cases: Business analytics Building visualizations Perform ad-hoc analysis Get business insights using data Integrated with RDS, Aurora, Athena, Redshift, S3… https://aws.amazon.com/quicksight/ © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DocumentDB Aurora is an “AWS-implementation” of PostgreSQL / MySQL … DocumentDB is the same for MongoDB (which is a NoSQL database) MongoDB is used to store, query, and index JSON data Similar “deployment concepts” as Aurora Fully Managed, highly available with replication across 3 AZ DocumentDB storage automatically grows in increments of 10GB, up to 64 TB. Automatically scales to workloads with millions of requests per seconds © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Neptune Fully managed graph database A popular graph dataset would be a social network Users have friends Posts have comments Comments have likes from users Users share and like posts… Highly available across 3 AZ, with up to 15 read replicas Build and run applications working with highly connected datasets – optimized for these complex and hard queries Can store up to billions of relations and query the graph with milliseconds latency Highly available with replications across multiple AZs Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon QLDB QLDB stands for ”Quantum Ledger Database” A ledger is a book recording financial transactions Fully Managed, Serverless, High available, Replication across 3 AZ Used to review history of all the changes made to your application data over time Immutable system: no entry can be removed or modified, cryptographically verifiable 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules https://docs.aws.amazon.com/qldb/latest/developerguide/ledger-structure.html © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Managed Blockchain Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority. Amazon Managed Blockchain is a managed service to: Join public blockchain networks Or create your own scalable private network Compatible with the frameworks Hyperledger Fabric & Ethereum © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Glue Managed extract, transform, and load (ETL) service Useful to prepare and transform data for analytics Fully serverless service Glue ETL S3 Bucket Extract Load Amazon RDS Transform Redshift Glue Data Catalog: catalog of datasets can be used by Athena, Redshift, EMR © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com DMS – Database Migration Service Quickly and securely migrate databases Source DB to AWS, resilient, self healing The source database remains available during the migration EC2 instance Running DMS Supports: Homogeneous migrations: ex Oracle to Oracle Target DB Heterogeneous migrations: ex Microsoft SQL Server to Aurora © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Databases & Analytics Summary in AWS Relational Databases - OLTP: RDS & Aurora (SQL) Differences between Multi-AZ, Read Replicas, Multi-Region In-memory Database: ElastiCache Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB) Warehouse - OLAP: Redshift (SQL) Hadoop Cluster: EMR Athena: query data on Amazon S3 (serverless & SQL) QuickSight: dashboards on your data (serverless) DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database) Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable) Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains Glue: Managed ETL (Extract Transform Load) and Data Catalog service Database Migration: DMS Neptune: graph database © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Other Compute Section © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What is Docker? Docker is a software development platform to deploy apps Apps are packaged in containers that can be run on any OS Apps run the same, regardless of where they’re run Any machine No compatibility issues Predictable behavior Less work Easier to maintain and deploy Works with any language, any OS, any technology Scale containers up and down very quickly (seconds) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Server (ex: EC2 Instance) Docker on an OS © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Where Docker images are stored? Docker images are stored in Docker Repositories Public: Docker Hub https://hub.docker.com/ Find base images for many technologies or OS: Ubuntu MySQL NodeJS, Java… Private: Amazon ECR (Elastic Container Registry) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Docker versus Virtual Machines Docker is ”sort of ” a virtualization technology, but not exactly Resources are shared with the host => many containers on one server Apps Apps Apps Container Container Container Guest OS Guest OS Guest OS Container Container Container (VM) (VM) (VM) Container Container Container Hypervisor Docker Daemon Host OS Host OS (EC2 Instance) Infrastructure Infrastructure © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com ECS ECS Service ECS = Elastic Container Service Launch Docker containers on AWS New Docker Container You must provision & maintain the infrastructure (the EC2 instances) EC2 instance EC2 instance EC2 instance AWS takes care of starting / stopping containers Has integrations with the Application Load Balancer © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Fargate Launch Docker containers on New Docker Container AWS You do not provision the infrastructure (no EC2 instances Fargate to manage) – simpler! Serverless offering AWS just runs containers for you based on the CPU / RAM you need © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com ECR Elastic Container Registry Private Docker Registry on ECR Fargate AWS This is where you store your Image 1 Docker images so they can be run by ECS or Fargate Image 2 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What’s serverless? Serverless is a new paradigm in which the developers don’t have to manage servers anymore… They just deploy code They just deploy… functions ! Initially... Serverless == FaaS (Function as a Service) Serverless was pioneered by AWS Lambda but now also includes anything that’s managed: “databases, messaging, storage, etc.” Serverless does not mean there are no servers… it means you just don’t manage / provision / see them © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com So far in this course… Amazon S3 DynamoDB Fargate Lambda © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Why AWS Lambda Virtual Servers in the Cloud Limited by RAM and CPU Continuously running Amazon EC2 Scaling means intervention to add / remove servers Virtual functions – no servers to manage! Limited by time - short executions Run on-demand Amazon Lambda Scaling is automated! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Benefits of AWS Lambda Easy Pricing: Pay per request and compute time Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time Integrated with the whole AWS suite of services Event-Driven: functions get invoked by AWS when needed Integrated with many programming languages Easy monitoring through AWS CloudWatch Easy to get more resources per functions (up to 10GB of RAM!) Increasing RAM will also improve CPU and network! © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Lambda language support Node.js (JavaScript) Python Java (Java 8 compatible) C# (.NET Core) Golang C# / Powershell Ruby Custom Runtime API (community supported, example Rust) Lambda Container Image The container image must implement the Lambda Runtime API ECS / Fargate is preferred for running arbitrary Docker images © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Example: Serverless Thumbnail creation u sh p New thumbnail in S3 trigger pu Image name sh New image in S3 AWS Lambda Function Image size Creates a Thumbnail Creation date etc… Metadata in DynamoDB © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Example: Serverless CRON Job Trigger Every 1 hour CloudWatch Events EventBridge AWS Lambda Function Perform a task © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Lambda Pricing: example You can find overall pricing information here: https://aws.amazon.com/lambda/pricing/ Pay per calls: First 1,000,000 requests are free $0.20 per 1 million requests thereafter ($0.0000002 per request) Pay per duration: (in increment of 1 ms) 400,000 GB-seconds of compute time per month for FREE == 400,000 seconds if function is 1GB RAM == 3,200,000 seconds if function is 128 MB RAM After that $1.00 for 600,000 GB-seconds It is usually very cheap to run AWS Lambda so it’s very popular © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon API Gateway Example: building a serverless API REST API PROXY REQUESTS CRUD Client API Gateway Lambda DynamoDB Fully managed service for developers to easily create, publish, maintain, monitor, and secure APIs Serverless and scalable Supports RESTful APIs and WebSocket APIs Support for security, user authentication, API throttling, API keys, monitoring... © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Batch Fully managed batch processing at any scale Efficiently run 100,000s of computing batch jobs on AWS A “batch” job is a job with a start and an end (opposed to continuous) Batch will dynamically launch EC2 instances or Spot Instances AWS Batch provisions the right amount of compute / memory You submit or schedule batch jobs and AWS Batch does the rest! Batch jobs are defined as Docker images and run on ECS Helpful for cost optimizations and focusing less on the infrastructure © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Batch – Simplified Example AWS Batch EC2 Instance ECS Insert Amazon S3 Trigger processed object Spot Instance Amazon S3 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Batch vs Lambda Lambda: Time limit Limited runtimes Limited temporary disk space Serverless Batch: No time limit Any runtime as long as it’s packaged as a Docker image Rely on EBS / instance store for disk space Relies on EC2 (can be managed by AWS) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Amazon Lightsail Virtual servers, storage, databases, and networking Low & predictable pricing Simpler alternative to using EC2, RDS, ELB, EBS, Route 53… Great for people with little cloud experience! Can setup notifications and monitoring of your Lightsail resources Use cases: Simple web applications (has templates for LAMP, Nginx, MEAN, Node.js…) Websites (templates for WordPress, Magento, Plesk, Joomla) Dev / Test environment Has high availability but no auto-scaling, limited AWS integrations © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Other Compute - Summary Docker: container technology to run applications ECS: run Docker containers on EC2 instances Fargate: Run Docker containers without provisioning the infrastructure Serverless offering (no EC2 instances) ECR: Private Docker Images Repository Batch: run batch jobs on AWS across managed EC2 instances Lightsail: predictable & low pricing for simple application & DB stacks © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Lambda Summary Lambda is Serverless, Function as a Service, seamless scaling, reactive Lambda Billing: By the time run x by the RAM provisioned By the number of invocations Language Support: many programming languages except (arbitrary) Docker Invocation time: up to 15 minutes Use cases: Create Thumbnails for images uploaded onto S3 Run a Serverless cron job API Gateway: expose Lambda functions as HTTP API © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Deploying and Managing Infrastructure at Scale Section © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com What is CloudFormation CloudFormation is a declarative way of outlining your AWS Infrastructure, for any resources (most of them are supported). For example, within a CloudFormation template, you say: I want a security group I want two EC2 instances using this security group I want an S3 bucket I want a load balancer (ELB) in front of these machines Then CloudFormation creates those for you, in the right order, with the exact configuration that you specify © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Benefits of AWS CloudFormation (1/2) Infrastructure as code No resources are manually created, which is excellent for control Changes to the infrastructure are reviewed through code Cost Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you You can estimate the costs of your resources using the CloudFormation template Savings strategy: In Dev, you could automation deletion of templates at 5 PM and recreated at 8 AM, safely © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Benefits of AWS CloudFormation (2/2) Productivity Ability to destroy and re-create an infrastructure on the cloud on the fly Automated generation of Diagram for your templates! Declarative programming (no need to figure out ordering and orchestration) Don’t re-invent the wheel Leverage existing templates on the web! Leverage the documentation Supports (almost) all AWS resources: Everything we’ll see in this course is supported You can use “custom resources” for resources that are not supported © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com CloudFormation Stack Designer Example: WordPress CloudFormation Stack We can see all the resources We can see the relations between the components © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Cloud Development Kit (CDK) Define your cloud infrastructure using a familiar language: JavaScript/TypeScript, Python, Java, and.NET The code is “compiled” into a CloudFormation template (JSON/YAML) You can therefore deploy infrastructure and application runtime code together Great for Lambda functions Great for Docker containers in ECS / EKS CDK Application CDK CLI CloudFormation CloudFormation Programming Template Languages © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com CDK Example © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Typical architecture: Web App 3-tier Auto Scaling group ElastiCache Availability zone 1 Multi AZ Store / retrieve Availability zone 2 session data + Cached data ELB Availability zone 3 Amazon RDS Read / write data © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Developer problems on AWS Managing infrastructure Deploying Code Configuring all the databases, load balancers, etc Scaling concerns Most web apps have the same architecture (ALB + ASG) All the developers want is for their code to run! Possibly, consistently across different applications and environments © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Elastic Beanstalk Overview Elastic Beanstalk is a developer centric view of deploying an application on AWS It uses all the component’s we’ve seen before: EC2, ASG, ELB, RDS, etc… But it’s all in one view that’s easy to make sense of! We still have full control over the configuration Beanstalk = Platform as a Service (PaaS) Beanstalk is free but you pay for the underlying instances © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Elastic Beanstalk Managed service Instance configuration / OS is handled by Beanstalk Deployment strategy is configurable but performed by Elastic Beanstalk Capacity provisioning Load balancing & auto-scaling Application health-monitoring & responsiveness Just the application code is the responsibility of the developer Three architecture models: Single Instance deployment: good for dev LB + ASG: great for production or pre-production web applications ASG only: great for non-web apps in production (workers, etc..) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Elastic Beanstalk Support for many platforms: Single Container Docker Go Multi-Container Docker Java SE Preconfigured Docker Java with Tomcat.NET on Windows Server with IIS Node.js If not supported, you can write PHP your custom platform (advanced) Python Ruby Packer Builder © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Elastic Beanstalk – Health Monitoring Health agent pushes metrics to CloudWatch Checks for app health, publishes health events © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeDeploy We want to deploy our application EC2 Instances being upgraded automatically v1 v2 Works with EC2 Instances v1 v2 Works with On-Premises Servers Hybrid service v1 v2 Servers / Instances must be provisioned On-premises Servers being upgraded and configured ahead of time with the CodeDeploy Agent © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeCommit Before pushing the application code to servers, it needs to be stored somewhere Developers usually store code in a repository, using the Git technology A famous public offering is GitHub, AWS’ competing product is CodeCommit CodeCommit: Source-control service that hosts Git-based repositories Makes it easy to collaborate with others on code The code changes are automatically versioned Benefits: Fully managed Scalable & highly available Private, Secured, Integrated with AWS © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeBuild Code building service in the cloud (name is obvious) Compiles source code, run tests, and produces packages that are ready to be deployed (by CodeDeploy for example) Retrieve code Build code Ready-to-deploy artifact CodeCommit CodeBuild Benefits: Fully managed, serverless Continuously scalable & highly available Secure Pay-as-you-go pricing – only pay for the build time © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodePipeline Orchestrate the different steps to have the code automatically pushed to production Code => Build => Test => Provision => Deploy Basis for CICD (Continuous Integration & Continuous Delivery) Benefits: Fully managed, compatible with CodeCommit, CodeBuild, CodeDeploy, Elastic Beanstalk, CloudFormation, GitHub, 3rd-party services (GitHub…) & custom plugins… Fast delivery & rapid updates CodePipeline: orchestration layer CodeCommit CodeBuild CodeDeploy Elastic Beanstalk © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeArtifact Software packages depend on each other to be built (also called code dependencies), and new ones are created Storing and retrieving these dependencies is called artifact management Traditionally you need to setup your own artifact management system CodeArtifact is a secure, scalable, and cost-effective artifact management for software development Works with common dependency management tools such as Maven, Gradle, npm, yarn, twine, pip, and NuGet Developers and CodeBuild can then retrieve dependencies straight from CodeArtifact © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS CodeStar Unified UI to easily manage software development activities in one place “Quick way” to get started to correctly set-up CodeCommit, CodePipeline, CodeBuild, CodeDeploy, Elastic Beanstalk, EC2, etc… Can edit the code ”in-the-cloud” using AWS Cloud9 © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Cloud9 AWS Cloud9 is a cloud IDE (Integrated Development Environment) for writing, running and debugging code “Classic” IDE (like IntelliJ, Visual Studio Code…) are downloaded on a computer before being used A cloud IDE can be used within a web browser, meaning you can work on your projects from your office, home, or anywhere with internet with no setup necessary AWS Cloud9 also allows for code collaboration in real-time (pair programming) © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS Systems Manager (SSM) Helps you manage your EC2 and On-Premises systems at scale Another Hybrid AWS service Get operational insights about the state of your infrastructure Suite of 10+ products Most important features are: Patching automation for enhanced compliance Run commands across an entire fleet of servers Store parameter configuration with the SSM Parameter Store Works for both Windows and Linux OS © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com How Systems Manager works We need to install the SSM agent onto the systems we SSM control Installed by default on Amazon Linux AMI & some Ubuntu AMI If an instance can’t be controlled with SSM, it’s probably an issue with the SSM agent! SSM Agent SSM Agent SSM Agent Thanks to the SSM agent, we can run commands, patch & configure our servers EC2 Instance EC2 Instance On Premise VM © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Systems Manager – SSM Session Manager Allows you to start a secure shell on your EC2 and EC2 Instance on-premises servers (SSM Agent) Execute commands No SSH access, bastion hosts, or SSH keys needed No port 22 needed (better security) Session Manager Supports Linux, macOS, and Windows Send session log data to S3 or CloudWatch Logs IAM Permissions User © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com AWS OpsWorks Chef & Puppet help you perform server configuration automatically, or repetitive actions They work great with EC2 & On-Premises VM AWS OpsWorks = Managed Chef & Puppet It’s an alternative to AWS SSM Only provision standard AWS resources: EC2 Instances, Databases, Load Balancers, EBS volumes… In the exam: Chef or Puppet needed => AWS OpsWorks © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com OpsWorks Architecture OpsWorks Stack Cookbook App Repository Repository Elastic Load Balancer Layer ALB OpsWorks Layers Application Server Layer App Server Instances (EC2) OpsWorks Layers Applications Database Layer Database Server (RDS) OpsWorks Layers © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Deployment - Summary CloudFormation: (AWS only) Infrastructure as Code, works with almost all of AWS resources Repeat across Regions & Accounts Beanstalk: (AWS only) Platform as a Service (PaaS), limited to certain programming languages or Docker Deploy code consistently with a known architecture: ex, ALB + EC2 + RDS CodeDeploy (hybrid): deploy & upgrade any application onto servers Systems Manager (hybrid): patch, configure and run commands at scale OpsWorks (hybrid): managed Chef and Puppet in AWS © Stephane Maarek NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com Developer Services - Summary CodeCommit: Store code in private git repository (version controlled) CodeBuild: Build & test code in AWS CodeDeploy: Deploy code onto servers CodePipeline: Orchestration of pipeline (from code to build to deploy) CodeArtifact: Store software packages / dependencies on AWS CodeStar: Unified view for allowing developers to do CICD and code Cloud9: Cloud IDE (Integrated Development Environment) with collab AWS CDK: Define your cloud infrastructure using a programming language © Stephane Maarek

Use Quizgecko on...
Browser
Browser