Data Warehouse Explained PDF
Document Details
Uploaded by PainlessTriangle9252
Tags
Summary
This document provides an overview of data warehouses, including their benefits, uses, and various implementation approaches. It also covers data lakes and data marts, comparing and contrasting those with data warehouses. The document explores the different types of data, and implementation methods.
Full Transcript
What is a data warehouse? A data warehouse is a system that aggregates data from one or more sources into a single consistent data store to support data analytics Data warehouse analytics Data warehouse systems support: Data Mining Artificial Intelligence Machine leaming Front-end reporting...
What is a data warehouse? A data warehouse is a system that aggregates data from one or more sources into a single consistent data store to support data analytics Data warehouse analytics Data warehouse systems support: Data Mining Artificial Intelligence Machine leaming Front-end reporting OLAP Online analyticall processing Where are data warehouses hosted? The beginning Traditional data warehouses hosted on-prenises within enterprise datacenters initially hosted on mainframes, and then on Unix, Windows añd Linux systems On-premises Where are data warehouses hosted? 2000s Growth of larger data volumes, emergence of data spechoused arians to ase tested consisting of software On-premises Appliances Where are data warehouses hosted? 2010-present Adoption of Cloud Data warehouses, providing eliminating hardware purchases, as a scalable service pay-as-you-go service On-premises Appliances Cloud Who uses data warehouses practically every industry transportation optimizes routes travel time equipment needs and staffing requirements banking and fin tech evaluate risks detect fraud and cross sell services Benefits of a data warehouse Centralizes data fron disparate sources Creates a single source of truth Leverages all the data while enhancing speed to aecess Facilitates smarter decisions using BI Better data quality Faster business insights Smarter decisions Competitive advantages and gains In summary * A data warehouse is a system that aggregates data from one or more sources into a single consistent data store to support data analytics * Data warehouses support data mining, Al and machine learning, OLAP and front-end reporting * Data warehouses and BI helps organizations improve data quality, speed business insights, improve decision-making, all which can result in competitive gains Data marts overview * * Define what a data mart is * Give examples of data marts * Compare data marts to transactional databases and enterprise data warehouses * Describe data pipelines for loading data marts What is a data mart? Shipping Data Mart Sales Data Mart Manufacturing Data Mart Finance Data Mart Warranty Data Mart Marketing Data Mart Enterprise Data Warehouse What are data marts used for? Data marts are used to: * Provide support for tactical decision-making * Help end users focus only on relevant data * Save time otherwise spent searching the data warehouse for answers Data mart structure Typical structure of a data mart: * Relational database * Star or snowflake schema * Central fact table of business metrics * Surrounded by associated dimension tables Data repository comparisons Data Marts Databases OLAP systems - read intensive OLTP systems - write intensive Use Txn DBs or warehouses as data sources Use operational applications as sources of data Contain clean, validated analytical data Contain raw, unprocessed transactional data Accumulate history for trend analysis May not always store history Data Marts Data Warehouses Small data warehouses with tactical scope Large repositories with broad, strategic scope Lean and fast Large and slow Types of data marts Dependent, independent, and hybrid data marts Data Sources Dependent Data Marts Emterprise Data Warehouse Types of data marts Dependent, independent, and hybrid data marts Independent Data Marts Hybord data marts data sources entreprise data warhouse hyprid data marts Dependent data marts Dependent data marts: * Inherit security from the EDW * Use cleaned and transformed data * Have simpler data pipelines Data Sourase 01 000ั Enterprise Data Warenouse Independent data marts Independent data marts: * Require custom ETL data pipelines * May require additional security measures Transformation Integration Data Sources 10 Independent Data Mart Custom ET. Pipeline Data mart purpose The purpose of a data mart is to provide: Timely, relevant data Rapid query tesponses Cost efficiency * * Is an isolated part of the larger EDW, built to serve a business function, purpose, or user community * Is designed to provide specific, timely: and rapid support for making tactical decisions * Typically has a star or snowflake schema * Accumulates clean and validated historical data * Can be categorized in relation to the EDW: dependent, independent, or a hybrid of the two What is a data lake? BLA, PALA Data lake Data source Data in raw 1ormat Transformed data Data ready for each need What is a data lake? Store large amounts of structured, semi-structured, and unstructured data in their native format Data can be loaded without Data lakes defining the structure or schema of data Use cases do not need to be known in advance Exist as a repository of raw data / 75 46% + What is a data lake? Data lakes * A reference architecture that combines multiple technologies * Can be deployed using * Cloud object storage * Large-scale distributed systems * Relational database management systems * NoSQL data repositories Data lake benefits Benefits Handles all types of data Unstructured Semi-structured Structured Scalable storage capacity Saves time that would have been used to define structures, create schemas, and transform data Can quickly repurpose data for a wide range of use cases 31 / 75 46% + Vendors for data lakes Amazon Microsoft Cloudera Oracle Google SAS IBM Snowflake Informatica Teradata Zaloni Data lakes versus data warehouses Data Data lake: Data is loaded in its raw and unstructured form Data warehouse: Data has been processed prior to loading tGPT Formulation Probl... TAEF FELLOWS P... TAEF FELLOWS P... Doin / 75 46% + Data lakes versus data warehouses Data lake: No need to define schema prior to loading Schema Data warehouse: Schema designed prior to loading Data lakes versus data warehouses Data quality Data lake: * Any data that might or might not be curated * Data is agile and might not comply with governance guidelines Data warehouse: Data is curated and follows data governance practices PT Formulation Probl... TAEF FELLOWS P... TAEF FELLOWS P... Dpointo 75 46% + Data lakes versus data warehouses Users Data lake: Data scientists, data developers, and business analysts using curated data Data warehouse: * Business analysts * Data analysts In summary * A data lake is a storage repository of raw data * The structure and schema of data does not need to be defined before loading into the data lake * Data lakes' benefits include the ability to store all types of data and to scale based on storage capacity Staging area for machane 1eatning developnent and advanced analytics