Introduction to Business Intelligence PDF 2023/2024
Document Details
Uploaded by Deleted User
Esprit School of Business
2024
Dr. Rihab BOUSLAMA
Tags
Summary
This document contains lecture notes on Introduction to Business Intelligence, covering definitions, BI systems, and challenges. It discusses different types of businesses and how they use BI. The 2023/2024 material also covers the DIKW pyramid, ETL process, and data storage.
Full Transcript
1 INTRODUCTION TO BUSINESS INTELLIGENCE 2023/2024 1BA Dr. Rihab BOUSLAMA 2 BI Warm-Up Challenge Your client is a retail company that wants to create a system to bet...
1 INTRODUCTION TO BUSINESS INTELLIGENCE 2023/2024 1BA Dr. Rihab BOUSLAMA 2 BI Warm-Up Challenge Your client is a retail company that wants to create a system to better understand customer behavior. Define how you would implement the system and what key metrics you'd focus on. 3 Introduction ▪ Example of scenarios for a retail company ▪ Sales were down in October. Sales ▪ To overcome this problem, some decisions should be made. ▪ But to make the right decision, we need to know: Why have sales dropped? In which product line? In which countries or regions? In the customer portfolio of which sales representatives? In which distribution segment? Wasn't there a similar drop in October of each year? 4 Introduction ▪ Motivation: Each company seeks to improve its decision-making performance by responding to the analysis demands of decision-makers. Experts of the domain, business analysts etc. 5 Definition of Business Intelligence ▪ Definition 1: A generic term that encompasses the applications, infrastructure, tools and best practices that enable information to be accessed and analyzed to improve and optimize decisions and performance. Gartner 6 Definition of Business Intelligence ▪ Definition 2: Business Intelligence is a set of methods, processes, architectures, applications, and technologies that gather and transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making. “Topic Overview: Business Intelligence”, 2008 7 Definition of Business Intelligence ▪ Business? The term “business” does not only mean commercial entities and activities. It has become more general and represents many types of entities and activities (organizations, functional areas or domains). These entities may include: Companies (for profit) and financially related Retail, manufacture, real-estate, financial, sports, media, advertising, entertainment, healthcare, publication, energy, etc. Non-profit organizations, institutions, associations, communities, etc. Government: citizen service, city planning, crime, immigration, etc. Individuals: personal health, exercise, learning, eating, power consumption, etc. 8 Why build a BI system? ▪ Serving information considered strategic: ▪ One of a company's most important assets is the wealth of information it gathers on a day-to-day basis. ▪ Most of this information is inaccessible, or distributed in a multitude of systems. ▪ The BI System is the result of a process which consists of extracting data from operational systems and other sources external to the company, transforming it into management information and making it accessible to users. 9 Why build a BI system? ▪ The BI system provides: ▪ Better access to data ▪ Improved quality of information ▪ Integration of data coming from different systems ▪ Common definition of information ▪ Better access to historical data 10 Why build a BI system? ▪ Challenges: ▪ Understanding users’ needs ▪ Data integrity ▪ Data feed costs ▪ Project scope definition ▪ System performance ▪ Common management rules 11 The DIKW pyramid ▪ DIKW stands for Data, Information, Knowledge and Wisdom 12 The DIKW pyramid ▪ Data: ✓ Raw fact, an unprocessed, uninterpreted value ✓ Can be collected by a measurement tool or present in a Database ✓ Does not allow you to decide which action to take. Example: Last month 1.217 incidents were recorded at the service center. 10 new service providers were employed in the IT department. 13 The DIKW pyramid ▪ Information: ✓ Interpreted data with meaning. ✓ Enables the manager to make a decision or take action. Example: Last month saw a 240% increase in the number of incidents compared with the previous month. The employment of 10 contractors is linked to a temporary increase in workload. 14 The DIKW pyramid ▪ Knowledge: ✓ Information understood ✓ Based on the experiences, ideas, values and opinions of those consulted Example: The load manager can establish a correlation between the arrival of new service providers and an increase in the number of incidents, based on a number of factors. 15 The DIKW pyramid ▪ Wisdom: ✓ General state of mind of final discernment on content and common sense judgment. ✓ Enables to launch actions to adopt the organization, people, processes and tools. Example: The senior organization manager makes long-term and strategic decisions for the IT organization. 16 Steps to build a decision making system planning ETL storage analysis restitution 17 Steps to build a decision making system planning ETL storage analysis restitution Planning: ▪ Planning is the first step to construct an integrated business intelligence system. ▪ Such a project requires solid administration. Example: A manager can have access to the salaries of the employees in his team. However, he cannot see his boss’s. The need for a rigorous security strategy. 18 Steps to build a decision making system planning ETL storage analysis restitution ETL: ▪ Extract: extraction of data from one or more data sources (e.g. text files, Excel, databases, etc.) ▪ Transform: transformation of aggregated data. ▪ Load: load data to the data warehouse 19 Steps to build a decision making system planning ETL storage analysis restitution Storage: ▪ There are several ways of storing data in a data Warehouse ▪ Each has its own advantages and disadvantages ▪ The Business Intelligence database administrator can choose between star schema, flake schema etc. 20 Steps to build a decision making system planning ETL storage analysis restitution Analysis: ▪ All the techniques of statistics, econometrics, data mining and operations research ▪ Often requires advanced statistical skills. 21 Steps to build a decision making system planning ETL storage analysis restitution Restituion: ▪ Dashboard generation is the tip of the iceberg: business intelligence. ▪ This is the part that most users see. ▪ These are generally attractive, intuitive interfaces that enable users, depending on their rights, to consult, annotate and even reports, dashboards, annotate them or even create them themselves. 22 Architecture of a BI system 23 DATA WAREHOUSE 2023/2024 1BA Dr. Rihab BOUSLAMA 24 Operational System VS. Decision Support System Operational systems Decision support systems Management systems (aka. On Line On Line Analytical Processing Transaction Processing) OLAP is optimized for complex data analysis Used to run the day-to-day core business of and reporting the company Use of data warehouse Use of ERP for data management 25 Operational Data VS. Decision Data Operational data Decision support data Detailed data Aggregated data Real-time / recent data Historized data Structured and optimized for the IT Structured and optimized for decision-makers professionals Centralized, integrated data Distributed non-homogeneous data Users: fewer, non-concurrent users Users: numerous, concurrent users 26 Solutions Data storage: a data warehouse ▪ One database with a single vocabulary ▪ Data adapted to decision-makers' needs (multidimensional structure, sophisticated level of details, historical data) Data query: ▪ interactive, user-friendly tools ▪ Analysis functions: sorting, correlations and exceptions, trend analysis etc. 27 OLTP vs. OLAP OLTP: On Line Transactional Processing ▪ OLTP systems are used for everyday transactions (e.g., ATMs, text messages). It is used for real-time execution of large volumes of database transactions. This type of systems is used by a large number of users. ▪ Example: on 01/15/12 at 1pm, customer X withdrew 500dt from account Y. OLAP: On Line Analytical processing ▪ OLAP is used to process multidimensional analysis on large volume of historical data aggregated from OLTP databases and other sources. OLAP is used in several domains such as, financial analysis, forecasting, budgeting, reporting, marketing and sales optimization, and decision making. ▪ Example: What was the sales volume by product and region during the second quarter of 2012? 28 OLTP vs. OLAP OLTP OLAP Design Application-oriented (eg., invoicing application) Subject-oriented (customer, product, seller, etc.) Static structure Evolving structure (e.g. star, snowflake) Data model Relational Multidimensional Goal Depends on the application Decision making Users Numerous Few Frontline workers (e.g., cashiers), customer self-service Analysts / managers applications (e.g., online banking) Non concurrent users concurrent users Data Atomic, detailed data, non aggregated Summarized data, recalculated, aggregated Current operations, real-time data Historized data Highly normalized (mostly relational DBMS) Denormalized (fast querying) Queries Simple to medium, frequent, predictable, repetitive Complex, fewer, non predictable transaction and query response times are lightning-fast response times are orders of magnitude slower. Access Access to lots of data Access to lots of information Updates Frequent Periodically storage space relatively little significant storage space requirements 29 Technologies Efficient storage and Data warehouse management of large volumes Complex handling OLAP server Application Data Mining 30 Data storage in BI systems 31 Data storage in BI systems ODS: Operational Data Store ▪ Collection of subject-oriented, volatile data, organized to support a specific decision-making process, in support of a particular operational activity. ▪ Provides an immediate, integrated view of the status of one or more operating systems. ▪ Store recent data. ▪ On-line data integration ▪ Used when data is dispersed. ▪ Example: We want to have a single view of a patient that we can modify online. This patient's data is stored in several databases (list of hospitalizations, list of diagnoses, list of pharmaceutical purchases, etc.). ODS can be used to extract and display this data. 32 Data storage in BI systems Data warehouse ▪ A data repository specific to the decision-making domain, designed primarily to analyze potential business levers. ▪ Collection of subject-oriented, integrated, non-volatile and historical data, organized to support a decision- support process. ▪ Off-line data integration; Data mart ▪ Subject-oriented, non-volatile data store made available to users in a decentralized decision-making context, targeted to a particular use. Data mining ▪ A set of analysis tools, methods and technologies used to define trends, segment information or establish correlations between data. 33 Data warehouse vs. Data mart Data warehouse ▪ Corporate data repository ▪ Combines multiple data marts ▪ Contains all necessary measures and dimensions ▪ Ensures the integrity of these dimensions across all Data Marts Data mart ▪ A set of specific dimensions and measures ▪ Used for specific business topics (e.g., sales, finance, HR) 34 Why not use a DBMS? DBMS and DWh ▪ Have different goals ▪ Store different type of data ▪ Use different kind of query ▪ Different data models ▪ Should be separated DBMS ▪ Transactional (OLTP) ▪ Is used to insert, modify and query information quickly, efficiently and securely ✓ Select, add, update and delete tuples. ✓ Fast, multi-user operations 35 What does data warehouse allows organizations to achieve? Centralized data management and accessibility Efficient data integration Historical trends analysis Advanced business intelligence Informed decision making Enhanced customer understanding Optimized operational efficiency Predictive analytics Cross-functional insights Compliance and security 36 Data warehouse definition A data warehouse is a copy of transaction data specifically structured for query and analysis. Ralph Kimball, 1996 A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. Bill Inmon, 1990 37 Characteristics of a data warehouse Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. 38 Characteristics of a data warehouse Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. 39 Characteristics of a data warehouse Time-variant: All data in the data warehouse is identified with a particular time period. Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business. 40 ETL process Extract data from different sources. Transform data. Load into the Dwh. DWH 41 ETL process Extract: ▪ Extraction of data from several sources (e.g., relational databases, flat files...). ▪ Need for specific tools to access operational databases (queries on heterogeneous databases). ▪ Need of an appropriate technique to extract only DWH the required data (i.e., data that has been created or modified since the last extract). ▪ This step should not disturb the functioning of the operational system. 42 ETL process Transform: ▪ Standardizing vocabulary, structures and values. ▪ Removing and merging redundancies. ▪ Data cleaning (removal of inconsistent data). ▪ Transforming data into a target format. DWH 43 ETL process Load: ▪ Data calculation and aggregation operations. ▪ Set up procedures for loading (nightly?) and restoring (in case of a problem). ▪ Taking granularity into account (retention of detailed data. However, data can be aggregated for DWH synthesis purposes) 44 Catalog of metadata ▪ Contains Dwh metadata. ▪ Ideally : Single storage location for information that drives processes in the warehouse. ▪ The catalog contains: ✓ The data stored, its format, its meaning, its degree of accuracy. ✓ Recovery/extraction processes in source databases ✓ Date of last loading ✓ History of source and warehouse data 45 Dwh: structure (1/3) Straightforward: ▪ We update the data warehouse over long periods of time: ✓ no accumulation ✓ long-term refreshment 46 Dwh: structure (2/3) Simple accumulation : ▪ We store the data for each update, and with frequent updates (e.g. every day): ✓ large storage space. ✓ no loss of details 47 Dwh: structure (3/3) Scroll-down summary: ▪ With each update, detailed data is stored, and old data is synthesized ✓ The older the data, the less detailed it is. day day week week week month month month Year 1 Year 2 Year n