Business Intelligence, Analytics, and Data Science: A Managerial Perspective PDF
Document Details
Uploaded by PoeticGyrolite250
Ramesh Sharda, Dursun Delen, Efraim Turban
Tags
Summary
This document provides an overview of Business Intelligence, Analytics, and Data Science. It focuses on concepts like descriptive analytics and data warehousing, explaining their role in managerial decision support. The concepts are illustrated with examples and diagrams.
Full Transcript
Business Intelligence, Analytics, and Data Science: A Managerial Perspective Fourth Edition Chapter 3 Descriptive Analytics II: Business Intelligence and Data Warehousing...
Business Intelligence, Analytics, and Data Science: A Managerial Perspective Fourth Edition Chapter 3 Descriptive Analytics II: Business Intelligence and Data Warehousing Copyright © 2018 Pearson Education Ltd. Business Intelligence and Data Warehousing BI used to be Business Analytics everything related to use of data for managerial Descriptive Predictive Prescriptive decision support Questions Now, it is a part of What happened? What is happening? What will happen? Why will it happen? What should I do? Why should I do it? Business Analytics ü Business reporting ü Data mining ü Optimization Enablers – BI = Descriptive ü ü Dashboards Scorecards ü ü Text mining Web/media mining ü ü Simulation Decision modeling Analytics ü Data warehousing ü Forecasting ü Expert systems Outcomes Well defined Accurate projections Best possible business problems of future events and business decisions and opportunities outcomes and actions Business Intelligence Advanced Analytics Slide 3-2 Copyright © 2018 Pearson Education Ltd. What is a Data Warehouse? A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format. A data warehouse is a central repository of information that can be analyzed to make more informed decisions. Data flows into a data warehouse from transactional systems, relational databases, and other sources, typically on a regular pace. A relational database? “The data warehouse is a collection of integrated, subject- oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”. Slide 3-3 Copyright © 2018 Pearson Education Ltd. A Relational database relational database is a collection of information that organizes data in predefined relationships where data is stored in one or more tables (or "relations") of columns and rows, making it easy to see and understand how different data structures relate to each other. Slide # of total Copyright © 2018 Pearson Education Ltd. Characteristics of DWs Subject oriented. Data are organized by detailed subject such as, sales, products, customers,….Etc. Integrated. Data are from different sources and must and must deal with naming conflicts and discrepancies among its units. Time-variant (time series). Data usually contains historical data (e.g: daily, weekly monthly, annually, except in real-time systems. Detecting trends, deviations, long term relationships for forecasting and comparison. Nonvolatile. Data can not be changed, changes are recorded as new data. Not normalized. because DWH contains historical data which we use for analysis, report preparations etc. No update or change as we use select statement in DW to get data. it has not gone through data quality processes such as data cleansing and de-duplication. Metadata. Data about data. How data organized. Web based, relational/multi-dimensional Client/server, real-time/right-time/active... Easy access to end users. Slide 3-5 Copyright © 2018 Pearson Education Ltd. Data Mart A departmental small-scale “DW” that stores only limited/relevant data. e.g: marketing, operations, …Etc. – Dependent data mart A subset that is created directly from a data warehouse. The Data WH must be constructed first. The end user is viewing the same version of data accessed by all other DW users. High cost of DWs limits their use to large companies. – Independent data mart (Alternative) Lower cost, scaled down version of DW. A small data warehouse designed for a strategic business unit or a department. Its source is not the Enterprise Data Warehouse EDW. Slide 3-6 Copyright © 2018 Pearson Education Ltd. Dependent vs Independent Data Marts Slide 3-7 Copyright © 2018 Pearson Education Ltd. Other DW Components Operational data stores (ODS) – A type of database often used as an interim/ short term/provisional area for a data warehouse. – It provides a fairly recent form of customer information file. – Unlike the static contents of the DW, The contents of ODSs are updated throughout the course of business operations. – Used for a short term decisions rather than medium and long term decisions associated with the EDW. – ODS can be classified as a short term memory which stores very recent information, whereas, DW is a long term memory because it stores permanent information. – ODS consolidates data from multiple sources and provides a near-real time, integrated view of volatile, current data. Oper marts – An operational data mart – Created when operational data needs to be analyzed multidimensionally. Slide 3-8 Copyright © 2018 Pearson Education Ltd. Multidimensionality The dimensions are the perspectives or entities concerning which an organization keeps records. Consider the data of a shop for items sold per quarter in the city of Delhi. The data is shown in the table. In this 2D representation, the sales for Delhi are shown for the time dimension (organized in quarters) and the item dimension (classified according to the types of an item sold). The fact or numerical measure displayed in rupee_sold (in thousands). Slide # of total Copyright © 2018 Pearson Education Ltd. Now, if we want to view the sales data with a third dimension, For example, suppose the data according to time and item, as well as the location is considered for the cities Chennai, Kolkata, Mumbai, and Delhi. These 3D data are shown in the table. The 3D data of the table are represented as a series of 2D tables. Copyright © 2018 Pearson Education Ltd. Operational Data Store Copyright © 2018 Pearson Education Ltd. Other DW Components Enterprise data warehouse (EDW) – A large scale data warehouse that is used for the enterprise for decision support. Slide 3-12 Copyright © 2018 Pearson Education Ltd. Other DW Components Metadata – “data about data” – In DW metadata describe the contents of a data warehouse and its acquisition and use. (Data Dictionary) Slide 3-13 Copyright © 2018 Pearson Education Ltd. A Generic DW Framework Data Applications Sources No data marts option (Visualization) Data Marts Routine ERP Business ETL Reporting Process Data mart Select (Marketing) / Middleware Legacy Metadata Data/text Extract mining Data mart Transform Enterprise (Operations) POS Data warehouse OLAP, Integrate API Data mart Dashboard, (Finance) Web Other Load OLTP/Web Replication Data mart (...) Custom built External applications Data Slide 3-14 Copyright © 2018 Pearson Education Ltd. DW Architecture Three-tier architecture 1. Data acquisition software (back-end) 2. The data warehouse that contains the data & software 3. Client (front-end) software that allows users to access and analyze data from the warehouse Two-tier architecture – First two tiers in three-tier architecture are combined into one … sometimes there is only one tier? Slide 3-15 Copyright © 2018 Pearson Education Ltd. DW Architectures Tier 1: Tier 2: Tier 3: Client workstation Application server Database server Tier 1: Tier 2: Client workstation Application & database server Slide 3-16 Copyright © 2018 Pearson Education Ltd. Called one tier because Only source layer is physically available. Data warehouse is virtual. Data warehouse is applied as a multidimensional view of operational data created by specific middleware, or an intermediate processing layer. Slide # of total Copyright © 2018 Pearson Education Ltd. Called two tier architecture because it physically shows both source layer and Data warehouse layer. Data staging is the process of preparing and organizing data before it moves to its final destination, A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories. Copyright © 2018 Pearson Education Ltd. Called three tier architecture because it physically shows both source layer, reconciled layer and Data warehouse layer. The benefit of a reconciled layer is how it can assist with creating a standardized reference data model used across an enterprise. This layer also aids with operational tasks like reporting. Bottom tier. Data reconciliation is the systematic process of comparing, verifying, and harmonizing data sets from multiple sources or systems to ensure consistency, accuracy, and completeness. Copyright © 2018 Pearson Education Ltd. A Web-based DW Architecture Web pages Application Server Client Web (Web browser) Internet/ Server Intranet/ Extranet Data warehouse Slide 3-20 Copyright © 2018 Pearson Education Ltd. Ten Factors that Potentially Affect the Architecture Selection Decision 1. Information 6. Strategic view of the data interdependence between warehouse prior to organizational units implementation 2. Upper management’s 7. Compatibility with existing systems information needs 8. Perceived ability of the in- 3. Urgency of need for a data house IT staff warehouse 9. Technical issues 4. Nature of end-user tasks 10. Social/political factors 5. Constraints on resources Copyright © 2018 Pearson Education Ltd. Slide 3-21 Data Integration and the Extraction, Transformation, and Load Process ETL = Extract Transform Load Data integration. It comprises three major processes: Data access. Ability to access & extract data from any data source. Data federation. The integration of business views across multiple data stores. Change capture. The identification, capture, and delivery of the changes made to enterprise data sources. Enterprise application integration (EAI) A technology that provides a vehicle for pushing data from source systems into a data warehouse. It involves integrating application functionality (rather than data) across systems, thus enabling flexibility and reuse. Recently, EAI is accomplished by using SOA (Service-Oriented Architecture) EAI can be used to facilitate data acquisition directly into a near-real-time DW or to deliver decisions to the OLTP systems. EAI can streamline processes that include data or activity from multiple software applications. For example, data from a CRM can be integrated with an e-mail marketing platform to deliver targeted messages to customers based on their prior behavior or demographics Slide 3-22 Copyright © 2018 Pearson Education Ltd. Data Integration and the Extraction, Transformation, and Load Process Enterprise information integration (EII) An evolving tool space that promises real-time data integration from a variety of sources, such as relational or multidimensional databases, Web services, etc. It is a mechanism for pulling data from source systems EII tools use predefined metadata to populate views that make data relational to end users. eXtensible Markup Language (XML) may be the most important aspect of EII because XML allows data to be tagged either at creation time or later. These tags can be extended and modified to accommodate almost any area of knowledge. Extraction, Transformation and Load (ETL) Extraction (reading data from one or more databases. Transformation ( Converting the extracted data from its previous form into the form in which it needs to be, so that it can be placed in a DW or simply in another database) Load. Putting the data into the DW. The purpose of ETL process is to load the Warehouse with integrated and cleansed data. Slide 3-23 Copyright © 2018 Pearson Education Ltd. ETL (Extract, Transform, Load) Issues affecting the purchase of an ETL tool – Data transformation tools are expensive – Data transformation tools may have a long learning curve. Important criteria in selecting an ETL tool – Ability to read from and write to an unlimited number of data sources/architectures – Automatic capturing and delivery of metadata – A history of conforming to open standards – An easy-to-use interface for the developer and the functional user Slide 3-24 Copyright © 2018 Pearson Education Ltd. Data Warehouse Development Data warehouse development. It is a major project for any organization and more complicated than simple mainframe selection and implementation. It comprises and influences many departments and many inputs & outputs interfaces. DW provides several benefits which can be classified as direct and Indirect: Direct benefits: End users can perform extensive analysis in numerous ways. A consolidated view of corporate data is possible Enhanced business performance Simplified data access. Indirect benefits: Enhanced business knowledge, Present a competitive advantage Improve customer service and satisfaction Facilitate decision making Help in reforming business processes Slide 3-25 Copyright © 2018 Pearson Education Ltd.