Lesson 3 - Storage and Database in Cloud PDF
Document Details
Uploaded by HealthyHydrogen
Singapore Management University
Tags
Summary
This document provides an overview of storage and database systems, including different types of storage (block, file, object), database types (relational, non-relational), and related concepts like memory caching and data retention. It also touches on cloud storage solutions and characteristics.
Full Transcript
STORAGE AND DATABASE IN CLOUD 1 OUTLINE Type of Storage Block Storage Shared File Systems Object Store Database Relational Database Non-relational database Memory/Cache Data retention and lifecycle management Data transformation process...
STORAGE AND DATABASE IN CLOUD 1 OUTLINE Type of Storage Block Storage Shared File Systems Object Store Database Relational Database Non-relational database Memory/Cache Data retention and lifecycle management Data transformation process 2 TYPE OF STORAGE Block Storage Raw Storage Data organized as an array of unrelated blocks Host File System places data on disk Ex: Hard Disks, Storage Area Network (SAN) Storage Arrays File Storage Unrelated data blocks managed by a file (serving) system Native file system places data on disk Ex: Network Attached Storage (NAS) Appliances, Windows File Servers, EFS Object Storage Stores Virtual containers that encapsulate the data, data attributes, metadata and Object IDs API Access to data Metadata Driven, Policy-based, etc. Ex: S3, Cloud Storage 3 STORAGE - CHARACTERISTICS Durability Availability Security Cost Scalabilit Performanc Integratio y e n Measure of Measure of Security Amount per Upward Performance Ability to expected data expected measures for storage unit, flexibility, metrics interact via loss downtime at-rest and in- e.g. $ / GB storage size, (bandwidth API or with transit data number of other services users Standard IA Glacier Two copies on one site Copies on two copies in three AZ sites designed for designed for designed for 99.99% 99.999% 99.999999999 durability durability % durability 4 BLOCK STORAGE Instance storage Amazon EBS Snapshots Temporary block-level storage attached to Easy to use, high performance block Incremental, point-in-time copies host hardware that is ideal for storage of storage service designed for use with of your EBS data that can be used information that frequently changes or is Amazon Elastic Compute Cloud (EC2) for to restore new volumes, expand the size of a replicated across multiple instances both throughput and transaction intensive volume, or move volumes across Availability workloads Zones 5 FILE SYSTEM Network Attached Storage Google Filestore AWS Elastic File System It is accessible from compute engines and Kubernetes engine It provides low latency and high IOPS, so it can be used for databases and other performance sensitive services 6 OBJECT STORAGE Global service Web accessible object store (through API or HTTPS) Highly durable (99.999999999% design) Limitlessly scalable Multiple Tiers to match your workload Data Lifecycle Rules Static Website Hosting Security, Compliance, and Audit capabilities Standard Storage Pricing (us-east-1) - $0.023 per GB 7 CHARACTERISTICS It is an object storage system It is designed for persisting unstructured data, like files, images, videos. The files stored in Cloud Storage are treated as atomic No seeking and reading specific blocks in the file It uses a global namespace for bucket names A bucket is named when it is created and CANNOT be renamed Individual objects within buckets can have their own access controls as well The naming convention makes it looks similar to a hierarchical structure No limit for Minimum object size Unlimited space world-wide Support re-transmission from the break point 8 STORAGE CLASSES 9 OBJECT VERSIONING It’s the pre-requisite of below services 1. Object lock (only apply to a specific version of the object) 2. S3 replication – Cross Region Replication (CRR) / Same Region Replication (SRR) Cloud Storage retains a noncurrent object version each time you replace or delete a live object version (using a metageneration number) Noncurrent versions retain the name of the object, but are uniquely identified by their generation number Noncurrent versions only appear in requests that explicitly call for object versions to be included A noncurrent version retains its ACLs and does not necessarily have the same permissions as the live version Object lifecycle management can be done with/without versioning 10 TYPICAL S3 USE CASE Mass object storage Infrequent data storage (archive warehouse) Static reporting service (public facing) Static website service (S3 + Cloudfront) Cheap replacement of Database for non-frequent usage (together with Athena) Data lake Data analytics intermediate conversion pool Centralized logging 11 TYPICAL S3 ARCHITECTURE Simple version Authentication Offload TLS Modify HTTP header for internal resources Improved version Case 1: Cloudfront + S3 for static website hosting Case 2: centralized reporting 12 DATA RETENTION AND LIFECYCLE MANAGEMENT Data retention How long to keep the data in different tiers of storages, database, Utility Bills – 1 year and cache Purchase Order – 3 years The retention policies are crafted based on business requirements Bank Statement – 5 years Safety Record– 7 years Patents – permanent Data lifecycle Creation, active use, infrequent access but online, archived, deleted Where to store the data in different lifecycle period Infrequent access Became historical data Bulk access only Backup only cache SQL DB Data Warehouse Storage Bucket Coldline Bucket Archive 13 LAB: S3 https://catalog.workshops.aws/general-immersionday/en-US/basic-modules/60-s3/ s3 14 15 DATABA SE 16 CONCEPT OF DATABASE A database is a collection of information that is organized so that it can be easily accessed, managed and updated. A database usually provides query language to access the data it stored Thus, a database actually consists of two parts: a storage pool and a management system …and thus, it has two tiers of charges: storage and query Different type of Different type of database information Structured Information Relational Database with relations SQL/mySQL/ PostgreSQL Key-value pair Key-value Database information Document information Document Database Wide column Wide column Database information 17 ACID A commonly used term to describe the characteristics of the transaction. This is not the requirement of a database. Not all database support ACID. Strong consistency Eventual consistency Instantaneous persistent disk write from memory 18 DB@EC2 VS DB SERVICES 19 RELATIONAL DATABASE Relational databases are highly structured data stores They are designed to store data with minimal risk of anomalies They offer comprehensive query language to manipulate the stored data They support ACID transactions Cloud managed SQL (Saas) Use pre-configured VMs Auto-backup REST APIs Highly scalable (CPU & Memory) Connect with CSP Compute Services and external systems 20 GLOBAL TRANSACTIONAL DATABASE It is a fully managed RDMS database It supports horizontal scalability across regions It support strong consistency and 99.999% availability (