Lec1_IE165 Overview of data warehousing PDF
Document Details
Uploaded by Deleted User
University of the Philippines Los Baños
Tags
Summary
This document is about data warehousing. It discusses its overview, learning objectives, and some real-world scenarios. It explains challenges and provides a general procedure for building a data warehouse.
Full Transcript
Overview of Data Warehousing IE 165 Information Systems II 1st semester AY 2024-2025 Learning objectives At the end of this module, the students should be able to: Define a data warehouse Learn the differences of a data warehouse from a database and business intelligence Grasp the importanc...
Overview of Data Warehousing IE 165 Information Systems II 1st semester AY 2024-2025 Learning objectives At the end of this module, the students should be able to: Define a data warehouse Learn the differences of a data warehouse from a database and business intelligence Grasp the importance of data warehousing in business intelligence applications Suppose that your manager wants to know… What is the best distribution channel for a new product? Which customers contributed the highest revenue to the company? What products should be boosted to improve its sales? Which products must be phased out? What are the best-selling products and which type of customers purchase them? Suppose that your manager wants to know… What is the best distribution channel for a new product? Which customers contributed the highest revenue to the company? What products should be boosted to improve its sales? Which products must be phased out? What are the best-selling products and which type of customers purchase them? https://www.yamaha-motor.com.ph/revzone/revzone.html Suppose that your manager wants to know… What is the best distribution channel for a new product? Which customers contributed the highest revenue to the company? What products should be boosted to improve its sales? Which products must be phased out? What are the best-selling products and which type of customers purchase them? https://www.yamaha-motor.com.ph/revzone/revzone.html It is common that Scattered organization’s data are: Inaccessible Not understandable Not useable Data Warehouse (DW) “A data warehouse is a system that aggregates data from different sources into a single, central, consistent data store to support data analysis, data mining, artificial intelligence (AI), and machine learning.” - IBM cloud education, 2020 Image source: https://www.ibm.com/cloud/learn/data-warehouse Data Warehouse (DW) It contains large amount of data from different sources processed so that data are represented in uniform ways. It organizes information for faster and effective querying. It stores denormalized data. It contains integrated granular historical data. Evolution of the Data Warehouse Source: https://booksite.elsevier.com/samplechapters/9780123743190/Sample_Chapters/02~Chapter_1.pdf Characteristics of a Data Warehouse Subject-oriente Integrated Non-volatile Time-variant d Data are organized Data from various Data are not Dats is stored for a around major sources are updated nor long period entities of interests aggregated into a deleted, it is enabling analysis of of an organization. common semantics restricted to trends/ data and format. read-only, making it changes over time. more reliable and stable. Characteristics of a data warehouse Subject-oriented – data are organized around business subject rather than by application in an organization Operations Registrar Housing vs Student’s university Student Loan record Characteristics of a data warehouse Integrated – data are integrated from different sources such as operational database systems, metadata and external sources. Source: Sherman, 2015 Characteristics of a data warehouse Non-volatile – data in a data warehouse are not updated nor deleted, it is restricted to read-only Characteristics of a data warehouse Time-variant – a data warehouse contains multiple years of data usually at most 10 years; historical data is stored ID Student number Date College Degree 1 2012-5xxxx 08/20/2012 CHE BSN 2 2012-5xxxx 08/20/2013 CEAT BSIE 3 2012-5xxxx 08/20/2019 GS MSIE 4 2012-5xxxx 01/04/2020 GS MEIE Reasons to develop a Data Warehouse (DW) 3. To provide a 1. To extract the data out 2. To produce answers to decentralized of an online transaction business questions not decision-making processing (OLTP) answerable by the current environment without environments. query system. relying with the IT people. 4. To integrate systems 5. To build the enterprise and applications. data models. Reasons to develop a Data Warehouse (DW) 6. To improve the quality of 7. To create a consistent operational data without 8. To provide answers to and holistic view of the rebuilding existing business queries faster. business operations. enterprise systems. 9. Liberalize the IT 10. To provide good organization from performance for both providing information for decision support and reporting activities. OLTP applications. Challenges in developing a Data Warehouse A different Integrating Volumes of development data data approach QUICK THINK! Do you think too much data is bad for an enterprise? Why or why not? “Data, data everywhere, but not any of the right information I need to do my job?” – from Sherman, 2015 Source: Sherman, 2015 Database vs. Data Warehouse DIMENSION DATABASE DATA WAREHOUSE Data organization Organized around individual transactions Organized around subjects Data stored Detailed or raw data Summarized and clean data Workload Specifically tuned or designed to support Optimized to perform well for a wide variety of predefined operations possible query operations Data modifications End-users routinely issue individual data End users do not directly update the data modifications to the database warehouse Schema design Fully normalized to optimize update/ insert/ Denormalized to optimize query performance delete performance Typical operations Accesses only a handful of records Queries thousands or millions of rows Historical data Stores data from only a few weeks or months Stores many months or years of data Allows easy access for No Yes data mining? Additional Source: https://docs.oracle.com/cd/B10500_01/server.920/a96520/concept.htm#50413 The 5 C’s of Data CLEAN CONSISTENT CONFORMED CURRENT COMPREHENSIVE Different versions Data must be Data necessary in No missing items, of data should not expressed in a performing jobs all are valid appear within the Data needs to be common and should be entries; data organization; no up-to-date and in shareable complete, cleaning and overlapping appropriate dimensions for a regardless of its profiling are sources of data frequency. convenient source and level usually done. having business analysis of granularity inconsistent data. General procedure of building a Data Warehouse 1 2 3 4 Gather data from various Translate these data into Summarize the data and Design the data sources and databases. one common form. predict the type of warehouse by queries to be performed organizing and by the user. clustering the data to allow in-depth analysis. Data Warehouse Architecture Source: https://www.ibm.com/cloud/learn/data-warehouse Data Warehouse vs. Business Intelligence DIMENSION BUSINESS INTELLIGENCE DATA WAREHOUSE What it is It is a set of tools and methods to analyze data It is the process of storing and staging information and discover, extract and formulate actionable and optimizing it for the enterprise’s utilization and information that would be useful for business analysis, fundamentally, it enables BI processes. decisions. Type of system It is a Decision Support System (DSS). It is a data storage system. Serves at the Front-end Back-end Collects data from The data warehouse for analysis Various disparate sources and organises it for efficient BI analysis Comprises of Business reports, charts, graphs, etc. Data held in “fact tables” and “dimensions” with business meaning incorporated into them Handled by Executives and analysts relatively higher up in Data engineers and system administrators who the hierarchy report to/work for the executives and analysts Examples SAP, Sisense, Datapine, Looker, etc. BigQuery, Snowflake, Amazon, Redshift, Panoply, etc. Additional Source: https://www.geeksforgeeks.org/difference-between-business-intelligence-and-data-warehouse/ Data Warehouse vs. Business Intelligence https://blogs.sap.com/2017/11/27/kpi-dashboard-with-lumira-2.0-demo/ https://www.youtube.com/watch?v=CFw4peH2UwU&t=31s Data Warehouse vs. Data Mining DIMENXN DATA MINING DATA WAREHOUSE What it is It is the process of It is s the process of analyzing data extracting and storing patterns. data for analytical analysis. Data event Analyzed regularly. Stored periodically. Who carries Business users with Engineers out the the help of engineers process Data flow Extracted from large Pooled together from data sets. various sources. Essence Provides the Provides the enterprise enterprise with with a memory. intelligence. Additional Source: https://www.geeksforgeeks.org/difference-between-data-warehousing-and-d ata-mining/ BUSINESS INTELLIGENCE Key terms Reporting dashboard DATA INTEGRATION DATA WAREHOUSING Data discovery Extract, transform, load (ETL) Business intelligence Data visualization ELT (ETL variation) Data integration Analytics Enterprise application Data warehouse OLAP integration (EAI/ SOA) Ad-hoc query Data marts Data profiling Online analytical processing Scorecards Data quality (OLAP) cubes Predictive analytics Metadata management Operational data store (ODS) Data mining Master data management Staging Search Source: Sherman, 2015 Textual analytics Operational BI In-memory analytics In-database analytics BI applications Definition of Key Terms – Data Integration DATA INTEGRATION Definition Extract, transform, load (ETL) The process in which data is taken from the source system, configured, and stored in a data warehouse or database. ELT (ETL variation) A variation of ETL. Enterprise application integration Tools and methods for consolidating and integrating the applications that (EAI/ SOA) exist in an enterprise. Data profiling This involves examining source system data for anomalies in values, ranges, frequency, relationships, and other characteristics that could hobble future efforts to analyze it. Data quality Achieved when data embodies the “five Cs”. Metadata management In managing metadata, an enterprise needs to understand what the data means, how it was transformed from creation to consumption, and its associated data quality. Master data management The set of processes used to create and maintain a consistent view, also referred to as a master list, of key enterprise reference data. Source: Sherman, 2015 Definition of Key Terms – Data Warehousing DATA WAREHOUSING Definition Business intelligence The technology used to support the organization’s decision-making process. Data integration Combining data from different sources and bringing it together to ultimately pro- vide a unified view. Data warehouse The process of storing and staging information and optimizing it for the enterprise’s utilization and analysis. Data marts A subset of a data warehouse that’s usually oriented to a business group or process rather than enterprise-wide views. Online analytical processing This technique for analyzing business data uses dimensional models often (OLAP) cubes deployed as cubes, which are like multidimensional pivot tables in spreadsheets Operational data store (ODS) Unlike a data warehouse, an ODS may serve both analytical and operational functions. Source: Sherman, 2015 Definition of Key Terms – Business Intelligence BUSINESS INTELLIGENCE (BI) Definition Collecting data from various sources and presenting it to businesspeople in an Reporting understandable way so they can analyze it. This is the core BI style. This BI tool displays numeric and graphical information on a single display, BI Dashboard making it easy for a businessperson to get information from different sources and customize the appearance. Data discovery Often used to understand trends and patterns of data. Presenting data in a visual way, such as with graphs and charts, helps Data visualization businesspeople glean insights they might not otherwise discern from tabular data. The examination of information to uncover insights that give a businessperson Analytics the knowledge to make informed decisions. Ad-hoc query People use SQL to make ad hoc queries to a database when the need arises. Performance management tools that help managers track performance against BI Scorecards strategic goals. These may be considered a type of dashboard. Source: Sherman, 2015 Definition of Key Terms – Business Intelligence Cont’d BUSINESS INTELLIGENCE (BI) Definition An advanced form of analytics that uses business information to find patterns Predictive analytics and predict future outcomes and trends. This process analyzes large quantities of data to find patterns such as groups Data mining of records, unusual records, and dependencies. The use of data mining for analysis of unstructured textual data such as Textual analytics emails. Text mining tools help find, for example, instances of fraud in thousands of emails or mentions of a company’s name in social media. Queries and reporting are performed on operational systems themselves, as Operational BI BI opposed to the data warehouse. Leveraging advances in memory to provide faster and deeper analytics by In-memory analytics querying a system’s random-access memory (RAM) instead of on disks. Any BI project deliverable that the BI team develops for business people to BI applications use in BI their analysis. This can be a dashboard, scorecard, report, data visualization, ad hoc query, OLAP cube, predictive model, or data model. Source: Sherman, 2015 Next meeting Please prepare for a short quiz on Overview of Data warehousing. References: https://www.ibm.com/cloud/learn/data-warehouse Jalao, E. R. L. (n.d.). Intro to Data Warehousing PPT slides Kendall, K. and Kendall, J. 2010. Systems Analysis and Design. 8th Ed. Prentice Hall. New Jersey. S. Sudarshan, Krithi Ramamritham. http://www.cse.iitb.ac.in/dbms/Data/Talks/krithi-talk- impact.ppt https://booksite.elsevier.com/samplechapters/9780123743190/Sample_Chapters/02~Cha pter_1.pdf Sherman, R. (2015). Business Intelligence Guidebook, ISBN 9780124114616, https://doi.org/10.1016/B978-0-12-411461-6.00011-3. Song IY. (2009). Data Warehouse. In: LIU L., ÖZSU M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_882