AWS Cloud Practitioner Slides PDF
Document Details
Uploaded by BestKnownKoto335
HKBK College of Engineering
null
Stephane Maarek
Tags
Summary
This document provides an overview of various AWS database services, including relational databases like RDS and Aurora, in-memory databases such as ElastiCache, NoSQL databases like DynamoDB, data warehouses like Redshift and EMR, and serverless query services like Athena and QuickSight. It also explores other services like Neptune (graph databases), QLDB (ledger databases), and managed blockchain solutions. The document is a set of slides.
Full Transcript
# AWS Certified Cloud Practitioner Slides v2.0 ## RDS Solution Architecture * Elatic Load Balancer * EC2 Instances * Possibly in an ASG * Read/write * Amazon RDS * SQL (relational) Database ## Amazon Aurora * Aurora is a proprietary technology from AWS (not open sourced) *...
# AWS Certified Cloud Practitioner Slides v2.0 ## RDS Solution Architecture * Elatic Load Balancer * EC2 Instances * Possibly in an ASG * Read/write * Amazon RDS * SQL (relational) Database ## Amazon Aurora * Aurora is a proprietary technology from AWS (not open sourced) * PostgreSQL and MySQL are both supported as Aurora DB * Aurora is "AWS cloud optimized" and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS * Aurora storage automatically grows in increments of 10GB, up to 64 TB. * Aurora costs more than RDS (20% more) - but is more efficient * Not in the free tier ## RDS Deployments: Read Replicas, Multi-AZ ### Read Replicas: * Scale the read workload of your DB * Can create up to 15 Read Replicas * Data is only written to the main DB * Amazon * RDS replication * RDS replication * RDS ### Multi-AZ: * Failover in case of AZ outage (high availability) * Data is only read/written to the main database * Can only have 1 other AZ as failover * Amazon * RDS Replication cross AZ * RDS * Read Replica * read * writes * Main * read * read * Read Replica * read * Main * writes * read * Failover DB * Failover in case of issues with Main DB ## RDS Deployments: Multi-Region * Multi-Region (Read Replicas) * Disaster recovery in case of region issue * Local performance for global reads * Replication cost * Amazon * RDS * Amazon * RDS * Amazon * RDS * Read Replica * read * Main * writes * Main * writes * Read Replica * read * read * Application(s) * Application(s) * Application(s) ## Amazon ElastiCache Overview * The same way RDS is to get managed Relational Databases... * ElastiCache is to get managed Redis or Memcached. * Caches are in-memory databases with high performance, low latency * Helps reduce load off databases for read intensive workloads * AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery, and backups ## ElastiCache Solution Architecture - Cache * Elastic Load Balancer * Read/write from cache * Fast * ElastiCache * In-memory database * EC2 Instances * Possibly in an ASG * Read/write From DB * Slower * Amazon RDS * SQL (relational) Database ## DynamoDB * Fully Managed * Highly available with replication across 3 AZ * NoSQL database - not a relational database * Scales to massive workloads, distributed "serverless" database * Millions of requests per second, trillions of row, 100s of TB of storage * Fast and consistent in performance * Single-digit millisecond latency - low latency retrieval. * Integrated with IAM for security, authorization, and administration. * Low cost and auto scaling capabilities ## DynamoDB - Type of Data * DynamoDB is a key/value database ### Products | Partition Key | Sort Key | Type | Attributes | |---|---|---|---| | Product ID | | Book ID | Odyssey, Homer, 1871 | | | | Album ID | 6 Partitas, Bach | | | Album ID: Track ID | Partita No. 1 | | | | Movie ID | The Kid | Drama, Comedy, Chaplin | ## DynamoDB Accelerator - DAX * Fully Manged in-memory cache for DynamoDB * 10x performance improvement - single-digit millisecond latency to microseconds latency - when accessing your DynamoDB tables * Secure, highly scalable & highly available * *Difference with ElastiCache at the CCP level* : DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases. ## Redshift Overview * Redshift is based on PostgreSQL, but it's not used for OLTP. * It's OLAP - online analytical processing (analytics and data warehousing) * Load data once every hour, not every second. * 10x better performance than other data warehouses, scale to PBs of data. * Columnar storage of data (instead of row based). * Massively Parallel Query Execution (MPP), highly available. * Pay as you go based on the instances provisioned. * Has a SQL interface for performing the queries. * Bl tools such as AWS Quicksight or Tableau Integrate with it. ## Amazon EMR * EMR stands for "Elastic Mapreduce" * EMR helps creating Hadoop clusters (Big Data) to analyze and process a vast amount of data. * The clusters can be made of hundreds of EC2 instances. * Also supports Apache Spark, HBase, Presto, Flink ... * EMR takes care of all the provisioning and configuration. * Auto-scaling and integrated with Spot instances. * Use cases: data processing, machine learning, web indexing, big data... ## Athena Overview * Fully Serverless database with SQL capabilities. * Used to query data in S3. * Pay per query. * Output results back to S3. * Secured through IAM. * Use Case: one-time SQL queries, serverless queries on S3, log analytics. ## Amazon QuickSight * Serverless machine learning-powered business intelligence service to create interactive dashboards. * Fast, automatically scalable, embeddable, with per-session pricing. * Use cases: * Business analytics * Building visualizations * Perform ad-hoc analysis * Get business insights using data * Integrated with RDS, Aurora, Athena, Redshift, S3... ## DocumentDB * Aurora is an "AWS-implementation" of PostgreSQL / MySQL ... * DocumentDB is the same for MongoDB (which is a NoSQL database) * MongoDB is used to store, query and index JSON data. * Similar "deployment concepts" as Aurora. * Fully Managed, highly available with replication across 3 AZ. * Aurora storage automatically grows in increments of 10GB, up to 64 TB. * Automatically scales to workloads with millions of requests per seconds ## Amazon Neptune * Fully managed graph database. * A popular *graph dataset* would be a social network: * Users have friends * Posts have comments * Comments have likes from users * Users share and like posts... * Highly available across 3 AZ, with up to 15 read replicas. * Build and run applications working with highly connected datasets - optimized for these complex and hard queries. * Can store up to billions of relations and query the graph with milliseconds latency. * Highly available with replications across multiple AZs * Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking. ## Amazon QLDB * QLDB stands for "Quantum Ledger Database". * A ledger is a book recording financial transactions. * Fully Managed, Serverless, High available, Replication across 3 AZ * Used to review history of all the changes made to your application data over time. * Immutable system: no entry can be removed or modified, cryptographically verifiable. * 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL. * *Difference with Amazon Managed Blockchain:* no decentralization component, in accordance with financial regulation rules. ## Amazon Managed Blockchain * Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority. * Amazon Managed Blockchain is a managed service to: * Join public blockchain networks * Or create your own scalable private network. * Compatible with the frameworks Hyperledger Fabric & Ethereum. ## AWS Glue * Managed extract, transform, and load (ETL) service. * Useful to prepare and transform data for analytics. * Fully serverless service. * Glue Data Catalog: catalog of datasets, can be used by Athena, Redshift, EMR. ## DMS Database Migration Service * Quickly and securely migrate databases to AWS, resilient, self-healing. * The source database remains available during the migration. * Supports: * Homogeneous migrations: ex. Oracle to Oracle * Heterogeneous migrations: ex Microsoft SQL Server to Aurora ## Databases & Analytics Summary in AWS * Relational Databases - OLTP: RDS & Aurora (SQL) * Differences between Multi-AZ, Read Replicas, Multi-Region * In-memory Database: ElastiCache * Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB) * Warehouse - OLAP: Redshift (SQL) * Hadoop Cluster: EMR * Athena: query data on Amazon S3 (serverless & SQL) * QuickSight: dashboards on your data (serverless) * DocumentDB: "Aurora for MongoDB" (JSON - NoSQL database) * Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable) * Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains * Glue: Managed ETL (Extract Transform Load) and Data Catalog service * Database Migration: DMS * Neptune: graph database ## Other Compute Section ## What is Docker? * Docker is a software development platform to deploy apps. * Apps are packaged in containers that can be run on any OS. * Apps run the same, regardless of where they're run: * Any machine * No compatibility issues * Predictable behavior * Less work * Easier to maintain and deploy. * Works with any language, any OS, any technology * Scale containers up and down very quickly (seconds) ## Docker on an OS ## Where Docker Images are Stored? * Docker images are stored in Docker Repositories. * Public: Docker Hub: [https://hub.docker.com/](https://hub.docker.com/) * Find base images for many technologies or OS: * Ubuntu * MySQL * NodeJS, Java... * Private: Amazon ECR (Elastic Container Registry) ## Docker versus Virtual Machines * Docker is "sort of" a virtualization technology, but not exactly. * Resources are shared with the host => many containers on one server. ## ECS * ECS = Elastic Container Service * Launch Docker containers on AWS * You must provision & maintain the infrastructure (the EC2 instances) * AWS takes care of starting/stopping containers. * Has integrations with the Application Load Balancer. ## Fargate * Launch Docker containers on AWS. * You do not provision the infrastructure (no EC2 instances to manage) - simpler! * Serverless offering. * AWS just runs containers for you based on the CPU/RAM you need. ## ECR * Elastic Container Registry. * Private Docker Registry on AWS. * This is where you store your Docker images so they can be run by ECS or Fargate.