Introduction To Business Intelligence Lecture 3 PDF
Document Details
Uploaded by Deleted User
University of Gdańsk
Tags
Summary
This document is a lecture presentation on Business Intelligence, covering OLAP (On-Line Analytical Processing), data mining techniques, and text mining. The presentation includes different types of OLAP systems and their characteristics, highlighting the differences between operational and analytical activities.
Full Transcript
INTRODUCTION TO BUSINESS INTELLIGENCE Lecture 3 Agenda 2 OLAP technology as an example of an exploratory and analytical system Data mining systems New data collection technologies University of Gdańsk 3 OLAP University of Gdańsk Infor...
INTRODUCTION TO BUSINESS INTELLIGENCE Lecture 3 Agenda 2 OLAP technology as an example of an exploratory and analytical system Data mining systems New data collection technologies University of Gdańsk 3 OLAP University of Gdańsk Information pyramid 4 Wisdom Knowledge Information Data Symbol University of Gdańsk Decision pyramid 5 Decision DSS/BI/OLAP Transaction OLTP Range of input and output variables University of Gdańsk OLAP (On-Line Analytical Processing) 6 The functionality of performing analyzes on an ongoing basis, based on data contained in multidimensional cubes of a data warehouse A critical tool of the organization, helping to identify past successes and failures, and based on them, predict future achievements and possible failures Object-oriented user interfaces where users manipulate objects representing organized groups of data University of Gdańsk OLAP systems 7 These are the tools used to analyze economic information They can be used by analysts, managers or executives to gain an insight into the functioning of the organization at any time in the past They are a fast, consistent, interactive tool that provides a wide range of data views University of Gdańsk OLAP - Sales cube example 8 University of Gdańsk OLAP and OLTP - differences 9 System OLTP OLAP Goal Real-time data input Reading and analysis of historical data Updates Yes No – read only Data edit Yes No Data sources Current data Historical data Associated database Operational Analytical, possible operational University of Gdańsk Components of OLAP systems 10 OLAP systems usually contain two components: calculate component - used to perform operations such as sums, ratios, time series analysis, descriptive statistics, as well as customized formulas and algorithms, and modeling and forecasts, multidimensional view component - for viewing data in defined dimensions. University of Gdańsk Types of OLAP systems 11 Types of OLAP: MOLAP (multidimensional), ROLAP (relational), HOLAP (hybrid). Others are LOLAP or DOLAP and WOLAP. University of Gdańsk MOLAP 12 MOLAP - multidimensional OLAP, using multidimensional database management systems for analytical processing. They work like a spreadsheet. Disadvantages: no scalability of dimensions, which makes it necessary to provide dimensions when designing a warehouse, no query language standards for these systems. Advantages: very high work efficiency. University of Gdańsk ROLAP 13 ROLAP - relational OLAP, supported by a relational database management system. Disadvantages: Low efficiency. There is a metadata server between the database and the analytical system. Advantages: Since they are based on relational systems, the scalability problem does not arise here. The query language has been standardized and is commonly known as SQL. University of Gdańsk HOLAP 14 HOLAP - hybrid OLAP, uses the MOLAP architecture to place and view data in aggregates. More detailed data is stored in a relational database. Characteristics: The system performance is higher than that of ROLAP systems, but still less than that of MOLAP systems. University of Gdańsk OLAP – Oracle database 15 Creating dimension Hierarchy Storage technology University of Gdańsk OLAP – Microsoft database 16 University of Gdańsk Operational versus analytical activities 17 Operational activities vs. Decision-making and analytical activities Often performed vs. Less frequent More predictable vs. Less predictable Smaller amounts of data in the query vs. Larger amounts of data in the query Query on raw data vs. Query on transformed data Requires real-time data vs. Requires current data and historical Low complexity of data sources vs. High complexity of data sources University of Gdańsk Multidimensional and relational data store 18 Source: docs.microsoft.com University of Gdańsk 19 Data exploration Data Mining Text Mining Duo Mining University of Gdańsk Data Mining 20 Data Mining allows for automatic data analysis. Data mining is about efficiently finding previously unknown dependencies and relationships between data. University of Gdańsk Comparison: reporting, OLAP, Data Mining 21 OLTP OLAP Data Mining Extraction of detailed and Summaries, trends and Discovering knowledge of summary data forecasts hidden patterns and insight „information” „analysis” „ insight and prediction” Who has been buying from the What is the fund's average Who will buy from the fund in fund in the past three years? revenue from buyers in a the next six months and why? given region and year? University of Gdańsk Data Mining - functions 22 Two basic functions: verification of hypotheses when there is some idea about the significant relationship between data elements, knowledge discovery where there may be hitherto unknown but meaningful relationships between data items that are difficult to deduce. University of Gdańsk Data Mining process phases 23 Understanding the business conditions Implementation Understanding the data Evaluation Data preparation Modeling University of Gdańsk Data mining method classes 24 association discovery - a class of methods that allows to discover unknown dependencies and associations between methods clustering - these methods are designed to find a finite set of object classes in the database with similar characteristics discovering sequence patterns - discovering certain patterns of behavior discovering classification - finding the relationship between the classification of objects and their characteristics discovering similarities in time courses - finding similarities in time series describing specific processes detection of changes and anomalies - finding differences between current and the expected data values University of Gdańsk Text Mining 25 Text Mining is a method that allows to perform automatic analysis of the content of text documents It usually works by summarizing information, keywords, and analyzing the content of the document Web Mining is a subclass of Text Mining, which focuses on analyzing the content of web pages University of Gdańsk Text Mining 26 Text Mining searches for patterns in unstructured data It consistently uses techniques such as semantic analysis and artificial intelligence The essence of Text Mining is automatic pattern discovery and thus learning and automatic creation of a document description University of Gdańsk Duo Mining 27 Goal Duo Mining was proposed as an integration of data and text mining tools Example When analyzing the creditworthiness of a given customer, both operations can be analyzed within the customer's bank account (structured data) as well as any documents provided by the customer (unstructured data) University of Gdańsk 28 Emerging technologies of data storage University of Gdańsk NoSQL 29 Variety of types (column data storage, key-value data storage, graph storage, document storage) Unstructured data storage Are different types, the most common is JSON-like documents Fields instead of columns Collections instead of tables Unstructured data storage was also proposed by W. Inmon in Data Warehouse 2.0 University of Gdańsk Question 30 Which of the following are types of OLAP systems: ROLAP XOLAP HOLAP BOLAP University of Gdańsk