Cloud Database & Analytics Services PDF

Document Details

ProlificIslamicArt

Uploaded by ProlificIslamicArt

SLIIT

Tags

cloud database cloud computing database services data analysis

Summary

This document presents an overview of cloud database and analytics services. It covers various database models, workloads, and caching techniques. It also explores data warehouses and data lakes, highlighting the differences and use cases for each. Finally, it discusses real-time data processing.

Full Transcript

Cloud Database & Analytics Services IT4090 – Cloud Computing Database Models Relational/SQL Non-relational/No-SQL Highly structured table organization Document oriented Rigidly-def...

Cloud Database & Analytics Services IT4090 – Cloud Computing Database Models Relational/SQL Non-relational/No-SQL Highly structured table organization Document oriented Rigidly-defined formats Large and complex queries Dependencies among tables Supports rapidly changing designs Enforce ACID (Atomicity, Consistency, Isolation, Examples – MongoDB, Cassandra, CosmosDB, Durability) Redis, CouchDB, Aurora Reduces anomalies, enforces integrity Use SQL to access data Examples – MS SQL, MySQL, Oracle, PostgreSQL, Amazon RDS Database Workloads Online Transaction Processing Online Analytical Processing (OLTP) (OLAP) Focus is on operational data Focus is on historical data Data analysis and reporting Transaction processing Large, complex queries Small, simple ad-hoc queries Data warehouses Response in milliseconds Responses times from seconds to hours Highly normalized Typically denormalized Relational / OLTP Relational / Relational OLAP Databases Relational Analytics Oracle, PostgreSQL, Oracle, PostgreSQL, MS SQL, MySQL MS SQL, MySQL Database Models & Workloads Non-relational / OLAP Non-relational / OLTP Big-data Analytics Key-value, Hadoop, HDInsight Columnar, Documents MongoDB. Cassandra. Riak Non-relational / No-SQL Databases Key-Value Document Store Columnar Graph Indexed keys and Store data in Optimized to Presents values documents (XML, retrieve columns of interconnected data Use case – session JSON etc.) data as logical graphs data, shopping cart Schema less Use case – CMS, Focus on Documents contain Blogging platforms relationships key-value pairs Use case – E- commerce, analytics Non-relational / No-SQL Databases Comparison TYPES FLEXIBILITY COMPLEXITY PERFORMANCE SCALABILITY KEY-VALUE STORE High None High High COLUMN SOTRE Moderate Low High High DOCUMENT High Low High Variable (High) GRAPH DB High High Variable Variable CAP Theorem Database Caching Caching is a buffering technique that stores frequently requested data in temporary memory. Facilitates data access and reduces database workloads. Two popular caching systems Redis Memcached Redis vs Memcached Redis Memcached Open source, in-memory, key-value data Open source, in-memory, object store store Sub-millisecond response times Sub-millisecond response times Supports strings and objects Supports various data structures (strings, Not persistent – cache does not survive lists, sets etc.) reboots Persistent – cache survives reboots Supports scaling out, multithreading Supports read replicas, atomic operations, backup/restore, HA Data Warehouse Two approaches A data warehouse is Data in a data Data warehouses are ETL – Extract, a type of data warehouse is intended for A repository for Transform, Load (Source management system typically derived querying and analysis structured, filtered -> Staging -> designed to enable from a wide variety Destination) only and often data that has already and support business of sources such as ELT – Extract, Load, contain large been processed for a intelligence (BI) application log files Transform (Source -> amounts of historical specific purpose Destination) activities, especially and transaction data. analytics. applications. Data Lake A data lake is a storage While a hierarchical data repository that holds a warehouse stores data in large amount of raw data files or folders, a data lake in its native format until it uses a flat architecture to is needed. store the data. Data Warehouse vs Data Lake Data Warehouse Data Lake Processed data Raw data Data currently in use Purpose of data not Used by business determined yet professionals Used by data scientists Real Time Data Processing Real-time data processing The processing is done as is the quickest data the data is inputted, so it processing technique that needs a continuous stream Also known as stream executes data in a short of input data in order to processing. period of time and provides provide a continuous the most accurate output. output.

Use Quizgecko on...
Browser
Browser