Topic 2.pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
Chapter 2 Why Data warehouse and Date mining? Copyright © 2011 Ramez Elmasri and Shamkant Navathe Outline ◼ Data Warehousing ◼ Date mining ◼ Data Mining works with Warehouse Data Copyright © 2011 Ramez Elmasri and Shamkant Navathe....
Chapter 2 Why Data warehouse and Date mining? Copyright © 2011 Ramez Elmasri and Shamkant Navathe Outline ◼ Data Warehousing ◼ Date mining ◼ Data Mining works with Warehouse Data Copyright © 2011 Ramez Elmasri and Shamkant Navathe. Data warehouse Copyright © 2011 Ramez Elmasri and Shamkant Navathe What is a Data Warehouse? “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” Copyright © 2011 Ramez Elmasri and Shamkant Navathe Purpose of Data Warehousing ◼ Traditional databases are not optimized for data access only they have to balance the requirement of data access with the need to ensure integrity of data. ◼ Most of the times the data warehouse users need only read access but, need the access to be fast over a large volume of data. ◼ There is a great need for tools that provide decision makers with information to make decisions quickly and reliably based on historical data. ◼ The above functionality is achieved by Data Warehousing and Online analytical processing (OLAP) Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data Warehouse is a Specialized DB Standard DB Data Warehouse ◼ Mostly updates Mostly reads ◼ Many small transactions Queries are long and complex ◼ Mb - Gb of data Gb - Tb of data ◼ Current snapshot History ◼ Thousands of users (e.g., Hundreds of users (e.g., clerical users) decision-makers, analysts) Copyright © 2011 Ramez Elmasri and Shamkant Navathe Introduction, Definitions, and Terminology ◼ Data warehouses have the distinguishing characteristic that they are mainly intended for decision support applications. ◼ Applications that data warehouse supports are: ◼ OLAP (Online Analytical Processing) is a term used to describe the analysis of complex data from the data warehouse. ◼ DSS (Decision Support Systems) also known as EIS (Executive Information Systems) supports organization’s leading decision makers for making complex and important decisions. ◼ Data Mining is used for knowledge discovery( the process of searching data for unanticipated new knowledge). Copyright © 2011 Ramez Elmasri and Shamkant Navathe Characteristics of Data Warehouses ◼ Multidimensional conceptual view ◼ Unlimited dimensions and aggregation levels ◼ Client-server architecture ◼ Multi-user support ◼ Accessibility ◼ Transparency ◼ Consistent reporting performance Copyright © 2011 Ramez Elmasri and Shamkant Navathe Functionality of a Data Warehouse ◼ Functionality that can be expected: ◼ Roll-up: Data is summarized with increasing generalization ◼ Slice and dice: Performing projection operations on the dimensions. ◼ Sorting: Data is sorted by ordinal value. ◼ Selection: Data is available by value or range. ◼ Derived attributes: Attributes are computed by operations on stored derived values. Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data Warehouse vs. Data Views ◼ Views and data warehouses are alike in that they both have read-only extracts from the databases. ◼ ◼ However, data warehouses are different from views in the following ways: ◼ Data Warehouses exist as persistent storage instead of being materialized on demand. ◼ Data Warehouses are not usually relational, but rather multi- dimensional. ◼ Data Warehouses can be indexed for optimization. ◼ Data Warehouses provide specific support of functionality. ◼ Data Warehouses deals huge volumes of data that is contained generally in more than one database. Copyright © 2011 Ramez Elmasri and Shamkant Navathe. Data Mining Copyright © 2011 Ramez Elmasri and Shamkant Navathe Definitions of Data Mining ◼ The discovery of new information in terms of patterns or rules from vast amounts of data. ◼ The process of finding interesting structure in data. ◼ The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data. Copyright © 2011 Ramez Elmasri and Shamkant Navathe Knowledge Discovery in Databases (KDD) ◼ Data mining is actually one step of a larger process known as knowledge discovery in databases (KDD). ◼ KDD: Process of semi-automatically analyzing large databases to find patterns that are: ◼ valid: hold on new data with some certainty ◼ novel: non-obvious to the system ◼ useful: should be possible to act on the item ◼ understandable: humans should be able to interpret the pattern Copyright © 2011 Ramez Elmasri and Shamkant Navathe 1 4 Why is data mining necessary? ◼ Make use of your data assets ◼ Many interesting things you want to find cannot be found using database queries. Copyright © 2011 Ramez Elmasri and Shamkant Navathe Goals of Data Mining and Knowledge Discovery (PICO) ◼ Prediction: ◼ Determine how certain attributes will behave in the future. ◼ Identification: ◼ Identify the existence of an item, event, or activity. ◼ Classification: ◼ Partition data into classes or categories. ◼ Optimization: ◼ Optimize the use of limited resources. Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data Mining works with Warehouse Data ◼ Data Warehousing provides the Enterprise with a memory ÑData Mining provides the Enterprise with intelligence Copyright © 2011 Ramez Elmasri and Shamkant Navathe