Data Warehousing and OLAP (2011) PDF

Summary

This document is a lecture topic on data warehousing and online analytical processing (OLAP). The document outlines various aspects of data warehousing, including its key components, definitions, characteristics, and uses. It also introduces the concept of OLAP and details various types of OLAP tools.

Full Transcript

Chapter 3 Data Warehousing and Online Analytical Processing (OLAP) Copyright © 2011 Ramez Elmasri and Shamkant Navathe Outline ◼ Data Warehouses and Merging Information Resources ◼ What is a Data Warehouse? ◼ History Leading to...

Chapter 3 Data Warehousing and Online Analytical Processing (OLAP) Copyright © 2011 Ramez Elmasri and Shamkant Navathe Outline ◼ Data Warehouses and Merging Information Resources ◼ What is a Data Warehouse? ◼ History Leading to Data Warehousing. ◼ OLAP fundamentals Copyright © 2011 Ramez Elmasri and Shamkant Navathe Problem: Heterogeneous Information Sources “Heterogeneities are everywhere” Personal Databases World Scientific Databases Wide Web Digital Libraries Different interfaces Different data representations Duplicate and inconsistent information Copyright © 2011 Ramez Elmasri and Shamkant Navathe Unified Access to Data Integration System World Wide Personal Web Digital Libraries Scientific Databases Databases Collects and combines information Provides integrated view, uniform user interface Copyright © 2011 Ramez Elmasri and Shamkant Navathe The Traditional Research Approach Query-driven ( on-demand) Disadvantages of Query-Driven Approach ◼ Delay in query processing ◼ Slow or unavailable information sources ◼ Inefficient and potentially expensive for frequent queries Copyright © 2011 Ramez Elmasri and Shamkant Navathe The Warehousing Approach - Metadata Information Clients (Database that describes various integrated in aspects of data in advance Data the warehouse Warehouse Stored in WH ). for direct Integration System Metadata querying and analysis... Extractor/ Extractor/ Extractor/ Monitor Monitor Monitor... Source Source Source Copyright © 2011 Ramez Elmasri and Shamkant Navathe Advantages of Warehousing Approach ◼ High query performance ◼ But not necessarily most current information ◼ Doesn’t interfere with local processing at sources Copyright © 2011 Ramez Elmasri and Shamkant Navathe What is a Data Warehouse? “A Data Warehouse is a ◼ subject-oriented, ◼ integrated, ◼ time-variant, ◼ non-volatile collection of data used in support of management decision making processes.” Copyright © 2011 Ramez Elmasri and Shamkant Navathe DW Definition… ◼ Subject-Oriented: ◼ The data warehouse is organized around the key subjects of the enterprise. Major subjects include ◼ Customers ◼ Patients ◼ Students ◼ Etc. Copyright © 2011 Ramez Elmasri and Shamkant Navathe DW Definition… ◼ Integrated ◼ The data housed in the data warehouse are defined using consistent ◼ Formats ◼ Related Characteristics Copyright © 2011 Ramez Elmasri and Shamkant Navathe DW Definition… ◼ Time-variant ◼ The data in the warehouse contain a time dimension so that they may be used as a historical record of the business Copyright © 2011 Ramez Elmasri and Shamkant Navathe DW Definition… ◼ Non-volatile ◼ Data in the data warehouse are loaded and refreshed from operational systems, but cannot be updated by end-users Copyright © 2011 Ramez Elmasri and Shamkant Navathe What is a Data Warehouse? “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” Copyright © 2011 Ramez Elmasri and Shamkant Navathe 1 4 History Leading to Data Warehousing ◼ Improvement in database technologies, especially relational DBMSs ◼ Advances in computer hardware, including mass storage and parallel architectures ◼ Emergence of end-user computing with powerful interfaces and tools ◼ Advances in middleware, enabling heterogeneous database connectivity ◼ Recognition of difference between operational and informational systems Copyright © 2011 Ramez Elmasri and Shamkant Navathe OLAP fundamentals Copyright © 2011 Ramez Elmasri and Shamkant Navathe 1 6 Online Analytical Processing (OLAP) (2/2) ◼ Describes a technology that is designed to optimize the storing and querying of large volumes of multi-dimensional data that is aggregated to various levels of detail to support the analysis of this data. Copyright © 2011 Ramez Elmasri and Shamkant Navathe 1 7 Online Analytical Processing (OLAP) (2/2) ◼ Enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a wide variety of possible views of the data. Copyright © 2011 Ramez Elmasri and Shamkant Navathe Goal of OLAP ▪ Goal of OLAP is: ▪ to support ad-hoc querying for the business analyst. ▪ Business analysts are familiar with spreadsheets ▪ Extend spreadsheet analysis model to work with data warehouse(Multidimensional view of data ) ▪ Multidimensional view of data is the foundation of OLAP Copyright © 2011 Ramez Elmasri and Shamkant Navathe OLTP vs. OLAP ▪ On-Line Transaction Processing (OLTP): – technology used to perform updates on operational or transactional systems. ▪ On-Line Analytical Processing (OLAP): – technology used to perform complex analysis of the data in a data warehouse Copyright © 2011 Ramez Elmasri and Shamkant Navathe 2 0 Categories of OLAP Tools ◼ OLAP tools are categorized according to the architecture used to store and process multi-dimensional data. ◼ There are three main categories: ◼ Multi-dimensional OLAP (MOLAP) ◼ Relational OLAP (ROLAP) ◼ Hybrid OLAP (HOLAP) Copyright © 2011 Ramez Elmasri and Shamkant Navathe 2 1 Multi-dimensional OLAP (MOLAP) ◼ Use specialized data structures and multi- dimensional Database Management Systems (MDDBMSs) to organize, navigate, and analyze data. ◼ Data is typically aggregated and stored according to predicted usage to enhance query performance. Copyright © 2011 Ramez Elmasri and Shamkant Navathe 2 2 Multi-dimensional OLAP (MOLAP) ◼ Use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management. ◼ Provides excellent performance when data is used as designed, and the focus is on data for a specific decision-support application. Copyright © 2011 Ramez Elmasri and Shamkant Navathe 2 3 Typical Architecture for MOLAP Tools Copyright © 2011 Ramez Elmasri and Shamkant Navathe 2 4 Relational OLAP (ROLAP) ◼ Supports RDBMS products using a metadata layer - avoids need to create a static multi- dimensional data structure - facilitates the creation of multiple multi-dimensional views of the two-dimensional relation. Copyright © 2011 Ramez Elmasri and Shamkant Navathe 2 5 Typical Architecture for ROLAP Tools Copyright © 2011 Ramez Elmasri and Shamkant Navathe 2 6 Hybrid OLAP (HOLAP) ◼ Provide limited analysis capability, either directly against RDBMS products, or by using an intermediate MOLAP server. Copyright © 2011 Ramez Elmasri and Shamkant Navathe , 9 / 1 OLAP Operations / ◼ Roll up (drill-up): summarize data 2 0 ◼ Roll up supports calculations using 2 aggregations such as SUM, COUNT, MAX, 4 MIN, and AVG ◼ Roll down (Drill down): from higher level summary to lower level summary ◼ Slice and dice: ◼ project and select Copyright © 2011 Ramez Elmasri and Shamkant Navathe 9 / 1 OLAP Operations / 2 0 2 Other operations 4 ◼ drill across: across more than one fact table ◼ drill through: through the bottom level to its back-end relational tables (using SQL) Copyright © 2011 Ramez Elmasri and Shamkant Navathe

Use Quizgecko on...
Browser
Browser