Data Warehousing Unit 1 PDF
Document Details
Uploaded by WarmheartedRoseQuartz2620
KCES's IMR, Jalgaon
Tags
Summary
This document provides an overview of data warehousing concepts, including definitions of data, data warehouses, and data marts. It also describes the components of a data warehouse and different data warehouse models. The document explores the three-tier architecture, Online Analytical Processing (OLAP), and Relational OLAP (ROLAP) and Multidimensional OLAP (MOLAP).
Full Transcript
Unit 1 Data warehousing. Data:- Data is a collection of information gathered by observations, measurements, research or analysis. They may consist of facts, numbers, names, figures or even description of things. Data is organized in the form...
Unit 1 Data warehousing. Data:- Data is a collection of information gathered by observations, measurements, research or analysis. They may consist of facts, numbers, names, figures or even description of things. Data is organized in the form of graphs, charts or tables. According to the of Bill lnmon A data warehouse is a large collection of business data used to help an organization make decisions. A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. Subject-Oriented: In data warehouse data is stored by subject , not by application. A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject. A data warehouse is a collection of data marts representing historical data from different operations in the company. It is collect the data from multiple heterogeneous (mixed) data base files. It store the 5 to 10 years of huge amount of data. Subject oriented :- Data that gives information about a particular subject instead of about a company’s o going operations. Data warehouses are designed to help you analyze data. for e.g. E-com industry Order Processing Sales operations application Data W.H Subject Customer Billing Customer Integrated : - integration is closely related to subject orientation. data that is gathered into the data warehouse from a variety Of source and merged into a coherent whole. Time variant :- all data in the data warehouse is identified with a particular time period. -> it means data warehouse has to contain historical data, not just current values, every data structure in the data warehouse contains the time elements. Non volatile:- Data is not update /delete from data warehouse in real time. data is stable in a data warehouse more data is added but is never removed. It can be – Used for decision support Used to manage and control business. Used by managers and end users to understand the business and make judgments.’ Data warehousing components 1) Data warehouse database:- it is central components of data warehouse. This the place where the data stored. This database is implemented on the RDBMS technology. 2)Meta data :- Data about data is called metadata. Meta data is Used for building, maintain and managing data warehouse. Its classification:- a) Technical Meta data:- which contain information about warehouse data for use by warehouse designers and administrators. b)Business Meta data:-which contains information that gives user an easy- to understand view of the information stored in the warehouse. 3) Extract transform load tools. Search and replace common name for data arriving form different source. In case of missing data populate them with default. 4)Data Marts. A data mart is only one subtype of data warehouse. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. 5) Query Tools: Allow people to interact with data warehouse. e.g. Insert, access , show etc... Difference between Data warehouse and Data mart. Data Data mart. warehouse Enterprises overall view. Departmental View. Takes more time to implements Less time to implements Size is more than to 100 TB Size is less than 10 TB Slow Responses Fast Responses Union of all data Mart. Single Business Process. Three-Tier Data Warehouse Architecture Generally a data warehouses accepts a three-tier architecture. Following are the three tiers of the data warehouse architecture. Bottom Tier − The bottom tier of the architecture is the data warehouse database server. It is the relational database system. We use the back end tools and utilities to feed data into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh functions. Middle Tier − In the middle tier, we have the OLAP Online Analytical Processing Server (OLAP) Server that can be implemented in either of the following ways. By Relational OLAP (ROLAP), which is an extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational operations. By Multidimensional OLAP (MOLAP) model, which directly implements the multidimensional data and operations. Top-Tier − This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools. The following diagram shows the three-tier architecture of data warehouse − Data Warehouse Models From the view of data warehouse architecture, we have the following data warehouse models − Virtual Warehouse Data mart Enterprise Warehouse Virtual Warehouse The view over an operational data warehouse is known as a virtual warehouse. It is easy to build a virtual warehouse. Building a virtual warehouse requires extra capacity on operational database servers. Data Mart:- Data mart contains a subset of organization-wide data. This subset of data is valuable to specific groups of an organization. Points to remember about data marts − 1. Window-based or Unix/Linux-based servers are used to implement data marts. They are implemented on low- cost servers. 2. Data marts are small in size. 3. Data marts are customized by department. 4. Data mart are flexible. 5. The source of a data mart is departmentally structured data warehouse. Enterprise Warehouse:- An enterprise warehouse collects all the information and the subjects spanning an entire organization It provides us enterprise-wide data integration. The data is integrated from operational systems and external information providers. This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or beyond. Online Analytical Processing Server (OLAP) “OLAP” term was introduced by E.F.Codd. Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It allows managers, and analysts to get an insight of the information through fast, consistent, and interactive access to information. OLAP is a category of software that allows users to analyze information from multiple database systems at the same time. It is a technology that enables analysts to extract and view business data from different points of view. Types of OLAP Servers We have four types of OLAP servers − 1. Relational OLAP (ROLAP) 2. Multidimensional OLAP (MOLAP) 3. Hybrid OLAP (HOLAP) 4. Specialized SQL Servers Relational OLAP ROLAP servers are placed between relational back-end server and client front-end tools. To store and manage warehouse data, ROLAP uses relational or extended-relational DBMS. ROLAP includes the following − Implementation of aggregation navigation logic. Optimization for each DBMS back end. Additional tools and services. Multidimensional OLAP MOLAP uses array-based multidimensional storage engines for multidimensional views of data. With multidimensional data stores, the storage utilization may be low if the data set is sparse. Hybrid OLAP Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP and faster computation of MOLAP. Hybrid OLAP servers allows to store the large data volumes of detailed information. Specialized SQL Servers Specialized SQL servers provide advanced query language and query processing support for SQL queries. OLTP (online transaction processing) OLTP (online transaction processing) is a class of software programs capable of supporting transaction- oriented applications on the Internet. OLTP systems are used for order entry, financial transactions, customer relationship management (CRM) and retail sales. Such systems have a large number of users who conduct short transactions. Database queries are usually simple, require sub- second response times and return relatively few records. An important attribute of an OLTP system is its ability to maintain concurrency. To avoid single points of failure, OLTP systems are often decentralized. Online transaction processing shortly known as OLTP supports transaction-oriented applications in a 3-tier architecture. OLTP administers day to day transaction of an organization. Examples of OLTP system are: Online banking Online airline ticket booking Sending a text message Order entry Add a book to shopping cart Benefits of OLTP method It administers daily transactions of an organization. OLTP extends the customer base of an organization by simplifying individual processes.