Data Warehousing and Management PDF
Document Details
Adam M. Monreal
Tags
Summary
This document is a presentation on data warehousing and management. It covers various aspects like data warehousing concepts, data marts vs. data warehouses, and ETL processes. The presentation also discusses data integration, quality, and the different components of a data warehouse architecture, including load, warehouse, and query managers. This document also defines data warehousing as a subject-oriented, integrated, time-variant, and non-volatile collection of data used for supporting management's decision-making processes. This document describes how Data Warehouses are subject oriented and built around major data entities or subjects of an organization. Examples are given, like Students, Teachers, and Courses in a university, or Employees in a company. The presentation discusses how a data warehouse integrates data from multiple systems to provide a wide view of an enterprise's data.
Full Transcript
DATA WAREHOUSING AND MANAGEMENT Presented by: Adam M. Monreal RECAP AGENDA Data Warehousing and Data Marts ETL (Extract, Transform, Load) processes Data integration and data quality AGENDA Concepts of data warehousing Data marts vs. data warehouses ETL (Extract, Trans...
DATA WAREHOUSING AND MANAGEMENT Presented by: Adam M. Monreal RECAP AGENDA Data Warehousing and Data Marts ETL (Extract, Transform, Load) processes Data integration and data quality AGENDA Concepts of data warehousing Data marts vs. data warehouses ETL (Extract, Transform, Load) processes Data integration and data quality DATA WAREHOUSE ‘A subject-oriented, integrated, time-variant, and non--volatile collection of data in support of management’s decision-making process.’ Bill Inmon Father of Data Warehousing ‘A subject-oriented, integrated, time-variant, and non--volatile collection of data in support of management’s decision-making process.’ Bill Inmon Father of Data Warehousing DATA WAREHOUSES ARE SUBJECT ORIENTED - THEY ARE BUILT AROUND THE MAJOR DATA ENTITY OR SUBJECTS OF AN ORGANIZATION. EXAMPLE, IN A UNIVERSITY, THE DATA SUBJECT (ENTITIES UNDER STUDY) COULD BE STUDENTS, TEACHERS, AND VARIOUS COURSES, WHEREAS IN A COMPANY THE SUBJECTS MIGHT BE WORKERS, INTERNS, SERVICES AND ITEMS. ‘A subject-oriented, integrated, time-variant, and non--volatile collection of data in support of management’s decision-making process.’ Bill Inmon Father of Data Warehousing A DATA WAREHOUSE INTEGRATES DATA FROM MULTIPLE SYSTEMS TO PROVIDE A WIDE VIEW OF ANY ENTERPRISE’S DATA. ‘A subject-oriented, integrated, time-variant, and non--volatile collection of data in support of management’s decision-making process.’ Bill Inmon Father of Data Warehousing DATA WAREHOUSES DATA IS NOT ALWAYS UP TO DATE AS IT CONTAINS HISTORICAL DATA WHICH IS VALID OR ACCURATE TILL SOME POINT OF TIME (OR TIME INTERVAL). ‘A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process.’ Bill Inmon Father of Data Warehousing DATA WAREHOUSE IS DESCRIBED AS A LONG TERM ENTERPRISE MEMORY DUE TO ITS NON-VOLATILE NATURE BECAUSE THE DATA IS NOT UPDATED IN REAL TIME BUT IS RATHER REFRESHED ON A REGULAR BASIS. HENCE, THE DATA WHICH IS ADDED GOES IN LIKE A SUPPLEMENT TO THE DATABASE, RATHER THAN A REPLACEMENT. THE DATABASE CONTINUALLY ABSORBS THIS NEW DATA, INCREMENTALLY INTEGRATING IT WITH THE PREVIOUS DATA. A data warehouse is a historical database and should be considered as the long term memory of an organization. Historical data is not to be tampered with; no insertion, up-dation and deletion are to be made. Usually, it is used only for retrieval such as verification and data analysis. The database of a University Management System. For the duration of study by a student at the university, his or her data will be retained in the main data store and all database operations such as insert, update, delete and retrieve will be performed on it. This database is usually called OLTP. But once a student has left, then his or her database is usually removed from OLTP because OLTP is meant to perform day-to-day operations. However, the management may be interested in retaining the data of old students. It can be used for later queries as well as for analysis purposes. Thus, all historical data can be vital for an organization for subsequent data analysis. All such historical records can be moved to a separate data store known as the data warehouse and may be called upon when required in the future. DATA WAREHOUSE ARCHITECTURE EVERY DATA WAREHOUSE HAS THREE FUNDAMENTAL COMPONENTS THESE ARE AS FOLLOWS. LOAD MANAGER WAREHOUSE MANAGER DATA ACCESS MANAGER DATA WAREHOUSE ARCHITECTURE 1. LOAD MANAGER The Load manager is responsible for Data collection from operational systems. It also performs data conversion into some usable form to be further utilized by the user. It includes all the programs and application interfaces which are required for extracting data from the operational systems, it’s preparation and finally loading of data into the data warehouse itself. It should perform the following tasks: Data Identification Data Validation for its accuracy Data Extraction from the original source Data Cleansing Data formatting Data standardization (i.e. bringing data into conformity with some standard format) Consolidates data from multiple sources to one place Establishment of Data Integrity using Integrity Constraints 2. WAREHOUSE MANAGER The Warehouse manager is the main part of Data Warehousing system as it holds the massive amount of information from myriad sources. It organizes data in a way so it becomes easy for anyone to analyze or find the required information. It is the core of the data warehouse itself. It maintains three levels of information: detailed, lightly summarized and highly summarized. It also maintains mete data, i.e., data about data. 3. QUERY MANAGER An interface which connects the end users with the information stored in data warehouse through the usage of specialized end-user tools. These tools are known as Data mining access tools.