Data Warehousing PDF

Summary

This presentation provides an overview of data warehousing, its components, and the process, focusing on transactions, aggregates, and data architecture. Topics covered also include data extraction, transformation, loading, and ad-hoc queries. The information is designed for professional audiences.

Full Transcript

DATA WAREHOUSING DATA WAREHOUSING Transactions and Aggregates Data Architecture Data: Extraction, Transformation, Loading 1 DATA WAREHOUSING Data warehousing is the process of constructing and using a data warehouse....

DATA WAREHOUSING DATA WAREHOUSING Transactions and Aggregates Data Architecture Data: Extraction, Transformation, Loading 1 DATA WAREHOUSING Data warehousing is the process of constructing and using a data warehouse. 2 DATA WAREHOUSING A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations. 3 DATA WAREHOUSING Ad Hoc Query User-defined searches let users explore data freely, without needing pre-built dashboards or coding. This allows them to find specific information on demand. 4 DATA WAREHOUSING Ad Hoc Query Furthermore, ad-hoc queries can be used to compare datasets side by side to gain a better understanding of the data's characteristics and relationships. 5 DATA WAREHOUSING Ad Hoc Query Datasets a collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer. "all hospitals must provide a standard data set of each patient's details" 6 DATA WAREHOUSING Ad Hoc Query Ad-hoc queries and interactive data visualizations go hand-in-hand, empowering users to explore data quickly and deeply. 7 DATA WAREHOUSING Ad Hoc Query Data Visualization Tools 8 DATA WAREHOUSING Ad Hoc Query 5 examples of ad hoc queries 1. Identifying Trends 2. Comparing Data Sets 3. Generating Reports 4. Spotting Anomalies 9 5. Updating Reports DATA WAREHOUSE VS DATA WAREHOUSING Definition: refers to the entire process and set of activities Definition: involved in designing, building, populating, and a centralized repository for storing, organizing, maintaining a data warehouse. and managing data from various sources such Purpose: purpose: enable organizations to turn their raw data into provide a unified and structured view of data for valuable insights. This process involves extracting, analysis, decision-making, and reporting. transforming, loading, and maintaining data in a data warehouse. 10 DATA WAREHOUSE VS DATA WAREHOUSING Characteristics: Activities: Data warehouses typically store data in a Data warehousing activities include data modeling, structured and optimized format for analytical ETL development, data integration, data governance, processing. security, and performance tuning. They often use techniques like data modeling, ETL (Extract, Transform, Load) processes, and It also involves defining business requirements, data indexing to ensure data quality, consistency, architecture, and selecting appropriate technologies. and performance. 11 DATA WAREHOUSING Transactions and Aggregates Transactions Warehousing refers to a single, discrete operation that alters data in some way. Data warehouses typically receive data from various sources, and each individual data update is considered a transaction. Transactional data provides the most detailed information about individual events or actions within a business process. Transactions are typically at the lowest level of granularity and store detailed data, which is necessary for historical and audit purposes. To save space and make data analysis faster, data warehouses often summarize transactions instead of storing every single one. 12 Aggregates DATA WAREHOUSING Aggregates are summaries or calculations performed on transactional data to provide higher-level insights. Aggregates are used to answer business questions and provide a more concise view of data, which is easier to interpret and analyze. Aggregates are at a higher level of granularity compared to transactions. They provide insights into trends, patterns, and overall performance. Storage: Aggregates are stored in data warehouses to accelerate query performance. Precomputing aggregates reduces the need to perform complex calculations on the fly when users query the data. Roll-Up and Drill-Down: Aggregates can be created at various levels of granularity to support both: high-level reporting (roll-up) and 13 detailed analysis (drill-down). DATA WAREHOUSING Transactions and Aggregates Relationship between transactions and aggregates in data warehousing Data Ingestion Transactions represent the raw data ingested into the data warehouse. These transactions are often loaded into a staging area before being processed and stored in the data warehouse. 4 stages to follow in your data mining process 1. Data cleaning and preprocessing 2. Data modeling and evaluation 3. Data exploration and visualization 14 4. Deployment and maintenance DATA WAREHOUSING Transactions and Aggregates Relationship between transactions and aggregates in data warehousing Data Transformation: After data is loaded into the data warehouse, it undergoes transformation processes, including aggregation. Aggregates are created by applying aggregation functions to transactional data to produce summarized views. Query Performance: Aggregates significantly improve query performance. When users query the data warehouse, it's more efficient to retrieve precomputed aggregates than to process large volumes of raw transactional data. Business Intelligence: Aggregated data is the basis for generating reports, dashboards, and visualizations that help businesses make informed decisions. It provides a high-level overview of key performance 15 indicators and trends. A data architecture describes how data is DATA WAREHOUSING Popular data architecture frameworks A data architecture can draw from popular enterprise architecture frameworks, including: TOGAF, The Open Group Architecture Framework (TOGAF) This enterprise architecture methodology was developed in 1995 by The Open Group, of which IBM is a Platinum Member. There are four pillars to the architecture: Business architecture Data architecture Applications architecture 16 Technical architecture, DATA WAREHOUSING Popular data architecture frameworks DAMA-DMBOK 2 Zachman Framework for Enterprise Architecture Originally developed by John Zachman at IBM in 1987, this framework uses a matrix of six layers from contextual to detailed, such as why, how, and what. It provides a formal way to organize and analyze data but does not include methods for doing so. Zachman Framework for Enterprise Architecture Originally developed by John Zachman at IBM in 1987, this framework uses a matrix of six layers from contextual to detailed, such as why, how, and what. It provides a formal way to organize and analyze data but does not include methods for doing so. 17 DATA WAREHOUSING Popular data architecture frameworks 18 DATA WAREHOUSING Popular data architecture frameworks Types of data architectures and underlying components A data architecture demonstrates a high level perspective of how different data management systems work together. These are inclusive of a number of different data storage repositories, such as data lakes, data warehouses, data marts, databases, et cetera. Together, these can create data architectures, such as data fabrics and data meshes, which are increasingly growing in popularity. These architectures place more focus on data as products, creating more standardization around metadata and more democratization of data across organizations via APIs. 19 DATA WAREHOUSING Popular data architecture frameworks The following section delves deeper into each of these storage components and data architecture types: Types of data management systems Data warehouses Data marts A data mart is a focused version of a data warehouse that contains a smaller subset of data important to and needed by a single team or a select group of users within an organization, such as the HR department. Since they contain a smaller subset of data, data marts enable a department or business line to discover more-focused insights more 20 quickly than possible when working with the broader data warehouse data DATA WAREHOUSING Popular data architecture frameworks The following section delves deeper into each of these storage components and data architecture types: Data Lakes: a data lake houses raw data, A data lake can store both structured and unstructured data, which makes it unique from other data repositories. Data lakes were originally created as a response to the data warehouse’s failure to handle the growing volume, velocity, and variety of big data. While data lakes are slower than data warehouses, they are also cheaper as there is little to no data preparation before ingestion. 21 DATA WAREHOUSING Data Access: Extraction, Transformation, Loading 1. Extract: This initial phase involves retrieving data from various sources. These sources can range from relational databases and flat files to cloud systems, APIs, and even web pages. The extraction process must handle different data formats and structures, ensuring that all relevant information is captured. 2. Transform: Once extracted, the data undergoes a series of transformations to prepare it for loading into the target system. This phase involves: Data Cleansing: Removing inconsistencies, errors, duplicates, and irrelevant data to ensure data quality and integrity. Data Standardization: Converting data into a uniform format, including consistent data types, units of measurement, and coding schemes. Data Aggregation: Combining and summarizing data based on specific criteria, such as grouping sales data by region or product category. Data Enrichment: Adding additional information to the data, such as 22 customer demographics or product descriptions, to provide a more DATA WAREHOUSING Data Access: Extraction, Transformation, Loading 3. Load: The final phase involves transferring the transformed data into the target system, which could be a data warehouse, data mart, or other data storage system. This phase ensures that the data is stored in a structured and accessible manner for analysis and reporting. 23 Catanduanes State University/ College of Information Communication and Technology THANK YOU Mariella Jane R. Andador Jessa Karyll Aguilar Angelia Arcilla

Use Quizgecko on...
Browser
Browser