Summary

These lecture notes cover data mining concepts, focusing on information engineering, data warehouses, and the basic architecture of a data mining system.

Full Transcript

Data Mining Information engineering ◼ Study and process information using modern technology such as computers and communications. ◼ To determine the best ways to save and organize and access and retrieval of information in an automated system, or website...

Data Mining Information engineering ◼ Study and process information using modern technology such as computers and communications. ◼ To determine the best ways to save and organize and access and retrieval of information in an automated system, or website 2 Information engineering 3 Information engineering ◼ Study and process information using modern technology such as computers and communications. ◼ To determine the best ways to save and organize and access and retrieval of information in an automated system, or website 4 Information engineering ◼ Data is/are the facts of the World uninterpreted raw signal E.g. "The price of crude oil is $80 per barrel.“ ◼ Information data + context meaning E.g. "The price of crude oil has risen from $70 to $80 per barrel" ◼ Knowledge purpose attached generative for action creates new information E.g. "When crude oil prices go up by $10 per barrel, it's likely that petrol prices will rise by 2p per litre" 5 What Is Data Mining ◼ Data mining refers to extracting or “mining” knowledge from large amounts of data. ◼ Remember that the mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining 6 Why Mine Data? Commercial Viewpoint ▪ Lots of data is being collected and warehoused – Web data, e-commerce – purchases at department/grocery stores – Bank/Credit Card transactions ▪ Computers have become cheaper and more powerful ▪ Competitive Pressure is Strong – Provide better, customized services for an edge (e.g. in Customer Relationship Management) 7 Why Mine Data? Scientific Viewpoint ◼ Data collected and stored at enormous speeds (GB/hour) – remote sensors on a satellite – telescopes scanning the skies – microarrays generating gene expression data ◼ Traditional techniques infeasible for raw data ◼ Data mining may help scientists – in classifying and segmenting data – in Hypothesis Formation 8 What Is Data Mining ◼ Data mining is defined as the process of discovering patterns in data. o The process must be automatic or (more usually) semiautomatic. o The patterns discovered must be meaningful in that they lead to some advantage, usually an economic one. o The data is invariably present in substantial quantities. 9 What Is Data Mining ◼ Data mining is a process that uses a variety of data analysis methods to discover the unknown, unexpected, interesting and relevant patterns and relationships in data that may be used to make valid and accurate predictions. ◼ Data mining is a Knowledge Discovery from Data, or KDD. 10 Architecture of a data mining system 11 What Is Data Warehouse ◼ Is a system used for reporting and data analysis and is considered a core component of business intelligence. ◼ DWs are central repositories of integrated data from one or more disparate sources. 12 What Is Data Warehouse 13 What Is Data Warehouse 14

Use Quizgecko on...
Browser
Browser