Summary

This document provides an overview of data warehouse architecture, emphasizing the details that depend on use cases like report generation and dashboarding. It also briefly discusses general EDW architecture and data sources, including staging areas and enterprise data repositories.

Full Transcript

Data Warehouse Architecture Overview Data warehouse architecture Data warehouse architecture details depend on use cases: * Report generation and dashboarding * Exploratory data analysis * Automation and machine learning * Self-serve analytics General EDw architecture Data Sources Stagin...

Data Warehouse Architecture Overview Data warehouse architecture Data warehouse architecture details depend on use cases: * Report generation and dashboarding * Exploratory data analysis * Automation and machine learning * Self-serve analytics General EDw architecture Data Sources Staging Area/ Sandbax Enterprise Data Warehouse Repository Data Marts Analytics & BI Tools Metadata Extract Transfoem Load Summary Ram data General EDW architecture Data Sources Staging Area/ Sandban Enterprise Data Warehouse Repository Data Marts. Analyties & BI Toals Metadata Extract Transform Load Summary data Haw data 00 00 "!e! EDW reference architectures Vendor-specific reference architectures: Adaptation of general model Interoperability Tool integrations tested Cubes, Rollups, and Materialized Views and Tables What is a data cube? * Example: Sales OLAP cube * Coordinates = dimensions * Cells = facts Product What is a data cube? * Example: Sales OLAP cube * Coordinates = dimensions * Cells = facts Cube operations: * Slicing * Dicing * Drilling up and down * Pivoting * Rolling up Slicing data cubes Slicing reduces cube dimension by 1: 2020 2019 2018 Hats 173 Gloves 97 T-shirts 617 256 Scarves 723 143 213 143 116 178 85B 220 25 56 Chio lowa 874 Utah 2016 Hats Gloves T-shirts Jeans Scarves 113 97 617 256 723 213 22 853 148 43 178 220 25 56 874 Ohio lowa Utah Slicing data cubes Slicing reduces cube dimension by 1: 2020 770g 201원 Hats 113 723 143 Gloves 203 43 T-shirts 617 115 173 Jeans 255 853 220 Scarves 25 56 874 Orio lema Utah 2018 Hats 113 Gloves 97 T-shirts 617 256 Scarves 25 Chio 723 213 22 853 56 lowa 143 43 178 220 844 Utah. Dicing data cubes Dicing shrinks a dimension: 2020 2019 201F Hats 113 723 143 Glowes 43 -shirts 617 115 178 Means 256 853 220 Scarves 25 55 874 Chio lowa Utah 2020 2015 2019 Cloves IT-shirts leans 213 116 256 853 43 178 220 Ohio lowa Utah Drilling up or down in data cubes Drilling into subcategories within a dimension: 12020 2019 2018 Hats 113 723 143 Gloves 97 213 43 -shirts 617 116 178 Jeans 256 853 220 Scarves 25 56 874 Chio lona. U町h 202대 202# 2018 Classic: 336 12 45 Slim fit 98 61 25 Regular fit 123 43 107 Ohio lowa Utah Pivoting data cubes 2020 2019 2018 Hats Gloves T-shirts eans Scarves 113 97 617 256 25 723 213 116 853 56 143 43 175 220 874 Ohio lowa Utah Scarves Jeans T-shirts Gloves HH5S 2020 85B 2019 609 113 2018 452 347 723 608 400 143 Chio lowa Utah Rolling up in data cubes * Roll up = summarize a dimension * Aggregate using COUNT, MIN, MAX, SUM, AVERAGE 2020 2019 2018 2020 2019 2018 Mlassic 396 12 46 Classic 151 Slim fit 98 61 25 Slim fit 61 123 Regular fit 43 107 Regular fit 91 Ohio Iowa Utah US 55 / 75 |一 46% + Materialized views * A "snapshot" containing results of a query * Used to replicate data in a staging database, * Precompute expensive queries for a data warehouse * Automatically keep query results synced to datalase * Safely work without affecting source database Materialized views Can be set up for different refresh options: * Never - populated on creation only * Upon request - manually or scheduled * Immediately - automatically, after every statement Materialized view in Oracle Create a materialized view in Oracle: CREATE MATERIALIZED VIEW MY MAT VIEW REFRESH FAST START WITH SYSDATE NEXT SYSDATE + 1 AS SELECT FROM Sny_table_name?= Facts and dimensions * Data can be categorized as facts and dimensions * Facts are usually measured quantities, such as temperature, number of sales, or mm of rainfall * Facts can also be qualitative * Dimensions are attributes relating to facts * Dimensions provide context to facts * "24°C" is not useful information by itself Fact tables * Facts of a business process, plus * Foreign keys to dimension tables * Dollar amounts for sales transactions * Can contain detail level facts, or * Facts that have been aggregated * Summary tables contain aggregated facts * "Quarterly Sales" summary table, with * "store_id" as foreign key Accumulating snapshot fact tables Used to record events during a well-defined business process Order ID, Order Amount, Order Date Amount Paid, Date Paid Build Start Date Build End Date Ship Date Dimensions * Dimensions categorize facts * Called categorical variables in stats and machine learning * Used to answer business questions * Used for filtering, grouping, and labelling * People, product, and place names, and date or time stamps * Dimension table stores dimensions of a fact Jined to fact table via foreign key Dimension table examples Product tables: Make, model, color, size Employee tables: Name, title, department Temporal tables: Date/time at granularity of recorded events Geography tables: Country, state, city, region Example schema with fact & dimension tables Dimension table Vehicle Vehicle_id (pk) Make Model Color Fact table Sales Sale_id (pk) Sale_date Sale_amount Vehicle_id (1K) Salesperson, id (fk) Dimension table Salesperson Salesperson_id (pk) First_ name Last_name Birth_date Each fact table typically has multiple dimension tables related to it In Summary * Business data includes facts and dimensions. * Facts measure business processes, such as sales transactions * Dimensions such as 'sold_by' and 'store_id' categorize facts * Dimensions, also known as categorical variables, are used for filtering, grouping, and labelling * Fact and dimensions are linked using foreign - primary keys

Use Quizgecko on...
Browser
Browser