Chapter 6 Data Business Intelligence Fall 2024 MOS 1033A PDF
Document Details
Uploaded by AdroitSugilite2171
King's University College
2024
null
null
Tags
Summary
This document is from King's University College, Fall 2024. It covers the topics of data, information, databases, and business intelligence. Topics include data quality, relational databases, and business intelligence.
Full Transcript
King's University College Fall 2024 MOS 1033A...
King's University College Fall 2024 MOS 1033A SECTION 6.1 – Data, Information, Databases Data Quality Storing Data Using a Relational Database Management System Using a Relational Database for CHAPTER SIX Business Advantages CHAPTER SIX OVERVIEW SECTION 6.2 – Business Intelligence Business Intelligence Data Warehouse © M cG ra w H ill L L C. A ll rig h ts re s e rv e d. N o re p ro d u ctio n o r d is trib u tio n w ith o u t th e p rio r w ritte n co n s e n t o f M cG ra w H ill L L C. Blockchain: Distributed Computing 4 3 4 1. Explain the four primary traits that determine the value of data SECTION 6.1: 2. Describe a database, a database LEARNING DATA, INFORMATION, management system, and the OUTCOMES AND BUSINESS relational database model INTELLIGENCE 3. Identify the business advantages of a relational database 5 6 5 6 Professor J. Siambanopoulos 1 King's University College Fall 2024 MOS 1033A DATA DATA Data is everywhere in an organization Levels, Formats, and Granularities of Data Employees must be able to obtain and analyze the many different levels, formats, and granularities of organizational data to make decisions Successfully collecting, compiling, sorting, and analyzing data can provide tremendous insight into how an organization is performing 7 8 7 8 DATA TYPE: TRANSACTIONAL AND ANALYTICAL 1. DATA TYPE: TRANSACTIONAL AND ANALYTICAL The Four Primary Traits of The Value of Data Transactional data – Encompasses all of the data contained within a single business process or unit of work, and its primary purpose is to support the performing of daily operational tasks E.g. daily sales, payroll, hours worked, product orders Think of this data as “events” Analytical data – Encompasses all organizational data, and its primary purpose is to support the performing of managerial analysis tasks E.g. sales forecasts, projected statements, industry trends 9 10 9 10 Professor J. Siambanopoulos 2 King's University College Fall 2024 MOS 1033A 2. DATA TIMELINESS 3. DATA QUALITY Timeliness is an aspect of data that depends on the situation Business decisions are only as good as the quality of the data real-time data _____________________________________________– Provides real-time data used to make the decisions in response to requests (real-time data) – Immediate, up-to-date You never want to find yourself using technology to help you data make a bad decision faster No LATENCY because each transaction matters now! E.g. Customer service system, bank ATMs, every credit card Data inconsistency - Occurs when the same data element has machine different values (Canada, CA, CAN = the same but different) This kind of data is harder to get and more expensive Data integrity issues - Occur when a system produces incorrect, inconsistent, or duplicate data (Canada, Canda) Batch processing _____________________________________________________– Done in groups (by time or type, etc.) at regular intervals because More efficient and timeliness IS NOT as important E.g. Payroll, class lists This kind of data is cheaper to get 11 13 11 13 DATA QUALITY DATA QUALITY Characteristics of High-Quality Data Low Quality Data Example 14 15 14 15 Professor J. Siambanopoulos 3 King's University College Fall 2024 MOS 1033A UNDERSTANDING THE SOURCES OF LOW-QUALITY UNDERSTANDING THE COSTS OF USING LOW- DATA QUALITY DATA The four primary sources of low-quality data include Potential business effects resulting from low quality data include 1. Customers intentionally enter inaccurate data to protect their privacy (or customer mistakes) Inability to accurately track customers 2. Operators enter abbreviated or erroneous data by accident Difficulty identifying valuable customers or to save time (or employee mistakes) Inability to identify selling opportunities 3. Different entry standards and formats e.g. (xxx) xxx-xxxx vs. Marketing to nonexistent customers xxx-xxx-xxxx Difficulty tracking revenue 4. Third party and external data contains inconsistencies, Inability to build strong customer relationships inaccuracies, and errors (technical errors) Lots of mistakes = new problems and wasted time a) During transmission you can have packet loss, unreliable connections (blips that cause some data to not be received or messed up 16 17 16 17 UNDERSTANDING THE BENEFITS OF GOOD DATA 4. DATA GOVERNANCE High quality data can significantly improve the chances of Data governance - Refers to the overall management of the making a good decision availability, usability, integrity, and security of company data Relates to what we discussed in Class 4 · Good decisions can directly impact an organization's bottom line Master data management (MDM) - The practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, THE KEY: Understanding the price of ”knowing” vs. the cost of products, sales, employees, and other critical entities that are “not knowing” commonly integrated across organizational systems It can cost a LOT of money to find something out but is it worth it? Data validation - Includes the tests and evaluations used to determine compliance with data governance polices to ensure correctness of data (and how it is handled, shared, destroyed, etc.) 18 19 18 19 Professor J. Siambanopoulos 4 King's University College Fall 2024 MOS 1033A HOW DID WE DO THIS IN THE PAST? ORIGINAL IMAGINE A DATABASE AT UWO USING PAPER/EXCEL THINKING FOR DATA: FOLDERS (AND SUBFOLDERS…) Where would you find YOUR name? Hierarchical databases! (Imagine: paper based) UWO’s Registrar’s Office when you register (and at King’s) I create a folder per topic. Then subfolders for other topics Remember those are different lists controlled by different people…..but then… Problems: Redundancy (same data in different spots) and no relationships (connections between data that are related) Your name is put on 10 different class lists Nightmare situation! Your name is also at the Finance Office (tuition), On-Campus Housing (residence), student clubs, Dean’s Office (marks), etc. You are listed at King’s/UWO many times in different ”folders” S None of the folders or data are connected What if you have the exact same name as another student? What if there is a mistake? Or you change your name? 20 21 STORING DATA IN A RELATIONAL DATABASE STORING DATA IN A RELATIONAL DATABASE Data is stored in databases (electronic) Database management systems (DBMS) –Allows users to create, read, update, and delete data in a relational database Database – maintains data about various types of locations (CRUD) (stores), objects (inventory), events (transactions), people (employees) Microsoft Access is a DBMS 22 23 22 23 Professor J. Siambanopoulos 5 King's University College Fall 2024 MOS 1033A STORING DATA IN A RELATIONAL DATABASE STORING DATA ELEMENTS IN ENTITIES AND ATTRIBUTES Data element _________________________________– The smallest or basic unit of data Entity – A person, place, thing, transaction, or event about (the data Field) Data model – Logical data structures that detail the which data is stored relationships among data elements using graphics or pictures The rows in a table contain entities Attribute (field, column) – The data elements associated with Metadata –Details about data an entity Data dictionary _______________________________________________– Compiles all of the The columns in each table contain the attributes metadata about the data elements in the data model Record – A collection of related data elements 24 26 24 26 CREATING RELATIONSHIPS THROUGH KEYS (VERY IMPORTANT) Keys identify the various entities (tables) in the database ____________________________– Primary Key A field (or group of fields) that uniquely identifies a given entity in a table It is a UNIQUE, ONE-OF-A-KIND IDENTIFIER like: Student numbers, Social insurance number, passport numbers, UWO emails (there will NEVER be two of the same) Are names unique identifiers? NO, not always (people can have the same first/last name!) This is why names are rarely (never) used as primary keys Foreign Key ____________________________________– A primary key of one table that appears an attribute in another table and acts to provide a logical relationship among the two tables A student name and NUMBER on the MOS 1033A class list 28 27 28 Professor J. Siambanopoulos 6 King's University College Fall 2024 MOS 1033A ORGANIZING MY DATA EXCEL IS OK FOR SMALL DATABASE USES BUT… Think about your pictures, texts, emails, videos, essays, Only one user can access the spreadsheet at a time (mostly) documents, websites, posts, conversations, etc. Excel is limited: 1,048,576 rows by 16,384 columns How do you find stuff? About a million “transactions” could be done and then? Is it easy to find what you want, quickly? Column width: 255 characters How do you organize your info? Here are some ways: All individual data points must fit in that size. What if you are File names (what it is) organizing opinions/qualitative data that is longer? Meta data (picture’s EXIF data, properties of videos), tags, Twitter: 280-character limit (4,000 for US Twitter Blue users) hashtags, labels [think of this as “data about data”] Sheets in a workbook: Limited by available memory (RAM) Date created/modified? Some user may be able to open it, some may not Type of file? How it looks (pics)? And it will be VERY slow for searching Whatever it is, there must be something UNIQUE about it for it to Can’t handle multiple data types (like pictures) easily be FINDABLE (everything needs a PRIMARY KEY) 31 30 31 USING A RELATIONAL DATABASE FOR BUSINESS INCREASED FLEXIBILITY ADVANTAGES A well-designed database should Database Advantages from a Business Perspective Include Handle changes quickly and easily Provide users with different views Have only one physical view - Deals with the physical storage of data on a storage device E.g. ALL students are in one group (each with a student #), ALL classes are in one group (each with a class number), etc. Have multiple logical views – Focuses on how individual users logically access data to meet their own business needs THEN we connect the items that need to be connected and create a new “logical view” or reports/output e.g. class list for MOS 1033A will “link” all students as a new group 32 33 32 33 Professor J. Siambanopoulos 7 King's University College Fall 2024 MOS 1033A CONCEPTUAL MODEL (FIRST STEP) PHYSICAL VIEW Outlines basically where the data will go and the type of All students go in the ONE spot, all professors in ONE spot, etc. connections Notice that each data element is also describes what KIND of data should be there: integer (positive number, no decimals) etc. 34 35 34 35 LOGICAL VIEW (SHARED DATA WITH RELATIONSHIPS INCREASED SCALABILITY AND PERFORMANCE PK = Primary Key FK = Foreign Key (data from another table) A database must scale to meet increased demand, while maintaining acceptable performance levels Scalability – Refers to how well a system can adapt to increased demands Performance – Measures how quickly a system performs a certain process or transaction 36 37 36 37 Professor J. Siambanopoulos 8 King's University College Fall 2024 MOS 1033A REDUCED DATA REDUNDANCY INCREASE DATA INTEGRITY (QUALITY) Relational databases reduce data redundancy Data integrity – measures the quality of data Data redundancy ____________________________________________– or storing the same data in multiple places The duplication of data Integrity constraint – rules that help ensure the quality of data Relational integrity constraint – Rules that enforce basic data- Inconsistency is one of the primary problems with redundant data based constraints, E.g. It’s like backing up your photos from different places but then E.g. phone numbers must be 10 digits in Canada (xxx) xxx-xxxx realizing…. In China: 1xx-XXXX-XXXX (11 digits) There are LOTS of doubles AND triples plus some places don’t have all the photos! Business-critical integrity constraint – Rules that enforce Now what do you keep (good data) and what do you delete business rules (redundant or wrong data)? E.g. Returns cannot be done over 90 days or THIS IS A BIG PROBLEM with poorly designed databases Discounts can’t be higher than 20% 38 40 38 40 INCREASED DATA SECURITY Data is an organizational asset and must be protected Databases offer several security features SECTION 6.2: Password – Provides authentication of the user BUSINESS INTELLIGENCE Access level – Determines who has access to the different types of data E.g. On Brightspace: Administrator, student, teaching assistant Access control – Determines types of user access, such as read- only access Interestingly, this is something you can do on your computer with files, applications, folders (if others use it too) 41 42 41 42 Professor J. Siambanopoulos 9 King's University College Fall 2024 MOS 1033A - BUSINESS INTELLIGENCE Organizational data is difficult to access 4. Identify the advantages of using business intelligence to support structured 1. Organizational data contains ____________________________________ managerial decision making data (this is the easy stuff) 5. Describe the roles and purposes of unstructured 2. Organizational data contains ___________________________________ data warehouses and data marts in an organization data such as voice mail, phone calls, text messages, and video LEARNING clips (this is the harder stuff) 6. Explain blockchain and its OUTCOMES advantages over a centralized relational database 43 44 43 44 BIGGEST CHALLENGES FOR BI THE PROBLEM: DATA RICH, INFORMATION POOR 1. Integrating data from different sources (connecting Many organizations find themselves in the position of being data rich and information poor. different databases or external ones, etc.) Even in today’s electronic world, managers struggle with the 2. Data quality issues (redundant, full of errors, etc.) challenge of turning their business data into business intelligence Watch the video: 1 What is Business Intelligence https://www.techtarget.com/searchbusinessanalytics/tip/Top-11-business- intelligence-challenges-and-how-to-overcome-them https://www.instinctools.com/blog/business-intelligence-challenges-our-experts- faced-and-overcame/ https://revopsteam.com/revops-strategy/business-intelligence-challenges/ 46 47 46 47 Professor J. Siambanopoulos 10 King's University College Fall 2024 MOS 1033A VIDEO: WHAT IS BUSINESS INTELLIGENCE THE SOLUTION: DATA AGGREGATION Improving the quality of business decisions has a direct impact The video describes BI as “functioning like a grocery store” on costs and revenue BI enables business users to receive data for analysis that is: Why? taking meesy into andturn it Reliable into a tidy and Consistent accessible Store where You navigate you gocery data Understandable & Find what on Easily manipulated your own youforneed without BI: Data that is organized and connected easier analysis to find: relying on others. significant advantage strategic decisions 50 48 - Inefficient Business Processes & Hidden 50 - identify areas of strength & patterns Discover new opportunitiesmeakness - THE SOLUTION: DATA AGGREGATION DATA WAREHOUSE BI Can Answer Tough Questions Data warehouses extend the transformation of data into information In the 1990’s executives became less concerned with the day-to- day business operations and more concerned with overall business functions The data warehouse provided the ability to support decision making without disrupting the day-to-day operations It can hold A LOT of data and manipulate it 51 52 51 52 Professor J. Siambanopoulos 11 King's University College Fall 2024 MOS 1033A USING BUSINESS INTELLIGENCE (BI) VIDEO: REAL LIFE EXAMPLES OF BUSINESS INTELLIGENCE Retail and sales: Predicting sales; determining correct inventory levels and distribution schedules among outlets; and The first example used was for: loss prevention. Hotels were able to see when they were most busy (charge Banking: Forecasting levels of bad loans and fraudulent credit higher prices) and card use, credit card spending by new customers, and which When they were not (charge lower prices) kinds of customers will best respond to (and qualify for) new loan offers. e.g. in Historical Operations management: Predicting machinery failures; finding key factors that control optimization of manufacturing compare The second example was: and capacity. real Grocery stores do this in _____________________ time; What you started with + New Stuff – What We Sold = On Shelf Watch the video: 2 Real Life Examples of Business Intelligence 53 54 REAL LIFE EXAMPLE OF BUSINESS INTELLIGENCE: DATA WAREHOUSE UBER Data warehouse – A logical collection of data – gathered from many different operational databases – that supports business analysis activities and decision-making tasks. All data is cleaned and organized The primary purpose of a data warehouse is to aggregate data throughout an organization into a single repository for decision- making purposes E.g. Mixing cash register sales, employee schedules, warehouse logistics and purchase information (costs) is NOT easy (and used to be impossible) ….. THEN bring in supplier invoices, health data (for food), etc. 57 56 57 Professor J. Siambanopoulos 12 King's University College Fall 2024 MOS 1033A DATA WAREHOUSE DATA WAREHOUSE Reasons business analysis is difficult from operational systems Data integration ____________________________________– Collection of data from various sources for the purpose of data processing Inconsistent Data Definitions Cost in production is done per unit, but costs in marketing are ETL (Extract Transform Load ____________________________________________________________________– , , fixed or large single amounts A process that extracts data from internal and external Lack of Data Standards databases, transforms the data using a common set of enterprise definitions, and loads the data into a data Metric is used in the WHOLE world except in the USA warehouse Poor Data Quality Inadequate Data Usefulness (wrong data tracked) Ineffective Direct Data Access Getting access to summarized/aggregate data when you need the raw data 58 59 58 59 DATA WAREHOUSE DATA ANALYSIS Data cube – The common term for the representation of multidimensional data 61 62 61 62 Professor J. Siambanopoulos 13 King's University College Fall 2024 MOS 1033A DATA CLEANSING OR SCRUBBING DATA CLEANSING OR SCRUBBING Dirty data – Erroneous or flawed data Dirty Data Problems John Siambanopolupagous Data cleansing or scrubbing – A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete data More on getting and cleaning data in Class 9: Obtain the Data 63 64 63 64 DATA CLEANSING OR SCRUBBING DATA CLEANSING OR SCRUBBING Contact Data in an Operational System: The “customer” is ALL Standardizing Customer Name from Operational Systems of these people (from this firm). Which is correct? 65 66 65 66 Professor J. Siambanopoulos 14 King's University College Fall 2024 MOS 1033A DATA CLEANSING OR SCRUBBING AFTER THE DATA IS READY….ANALYSIS! Cost of Accurate and Complete data Manipulating the data using various data analysis techniques and tools to find: Trends, correlations, outliers, variations etc. that tell a story (explain), answer a question, or raise new questions More on this topic in Class 10: Analyze the Data Data-Driven Decision-Making (DDDM) The process of making strategic business decisions based on facts, data, and metrics instead of intuition, emotion, or observation 67 68 67 68 & WHAT ARE THE BENEFITS OF DATA-DRIVEN OR WHAT ARE THE BENEFITS OF DATA-DRIVEN OR EVIDENCE-BASED DECISIONS EVIDENCE-BASED DECISIONS Accuracy ______________________________: Decisions are typically more accurate, as they are based on factual information and analysis rather than Efficiency __________________________________: Using data enables firms to make better use of their resources intuition or gut feelings For example, a retail store can better predict future sales trends by E.g. A data-driven marketing campaign can target potential analyzing historical sales data rather than relying on a manager’s customers better hunch Transparency and trust Decisions based on data provide _____________________________________: objectivity ____________________________: Evidence-based decisions reduce the a clear rationale for actions taken, fostering transparency and risk of personal biases, which can lead to flawed conclusions accountability. This can help build trust among stakeholders, Consider hiring: an objective evaluation of candidates, compared to including employees and investors. “I have a good feeling about this person.” E.g. When you base your decision on facts/data that everyone can Adaptability ________________________________: Data-driven decisions allow organizations to adapt to changing market conditions more see, understand and agree on…there is greater trust in that decision effectively. (catch trends before it’s too late) https://www.quanthub.com/what-are-the-benefits-of-data-driven-or-evidence-based-decisions- E.g. A restaurant using customer feedback and sales data can over-heuristics/ identify low-performing menu items and replace them with more popular options 69 71 69 71 Professor J. Siambanopoulos 15 King's University College Fall 2024 MOS 1033A DATA VISUALIZATION DATA VISUALIZATION Data artists use infographics to display patterns, relationships, Data visualization - Describes technologies that allow users to and trends in a visual format “see” or visualize data to transform data into a business perspective Data visualization tools - Move beyond Excel graphs and charts into sophisticated analysis techniques such as pie charts, controls, instruments, maps, time-series graphs, and more More on this topic in Class 11: Report the Results 73 74 73 74 DATA VISUALIZATION BLOCKCHAIN: DISTRIBUTED COMPUTING Distributed computing - Processes and manages algorithms Business intelligence dashboards - Track corporate metrics across many machines in a computing environment such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls allowing users to manipulate data for analysis 75 76 75 76 Professor J. Siambanopoulos 16 King's University College Fall 2024 MOS 1033A BLOCKCHAIN: DISTRIBUTED COMPUTING BLOCKCHAIN: DISTRIBUTED COMPUTING Proof-of-work ___________________________________ - A requirement to define an Ledger - Records classified and summarized transactional data expensive computer calculation, also called mining, that needs to be performed in order to create a new group of trustless Blockchain - A type of distributed ledger, consisting of blocks of transactions (blocks) on the distributed ledger or blockchain data that maintain a permanent and tamper-proof record of transactional data Think: A grocery store wants to import bananas Consensus ______________________________ is created (everyone works on the How many farmers, suppliers, border services, inspection agencies, same thing and eventually should get to the same result) transportation firms, insurance companies, drivers are involved! Proof-of-work has two primary goals: Ensuring good data is a huge cost, effort and susceptible to mistakes 1.To verify the legitimacy of a transaction, or avoid the so-called Watch the video: 3 How Does a Blockchain Work double-spending 2.To create new digital currencies by rewarding miners for performing the previous task 77 78 77 78 BLOCKCHAIN: DISTRIBUTED COMPUTING BLOCKCHAIN: DISTRIBUTED COMPUTING Blockchain - Formed by linking together blocks, data structures containing a hash, previous hash, and data Genesis block - The first block created in the blockchain Hash - A function that converts an input of letters and numbers into an encrypted output of a fixed length 80 81 80 81 Professor J. Siambanopoulos 17 King's University College Fall 2024 MOS 1033A HTTP://WWW.MOVABLE- TYPE.CO.UK/SCRIPTS/SHA256.HTML Any data can create a hash. Below is a 64-bit hash of a sentence. Each of the 64 can be (0 – 9 ) or (a to f) (hexidecimal) 1 bit = 1 slot = 0 or 1 = 2 choices 2 bits = 2 slots = 00, 01, 10, 11 = 4 Any change will create a new hash. A new unique fingerprint. choices Example: If I write this sentence: Etc. My name is John Siambanopoulos So, 64-bit encryption = 0e7781a6a84b49ad067e740a1e7b5ee898cea13f64304413f06b 2 x 2 x 2 x….(total of 64 times) = 6f81297bf105 My name is john Siambanopoulos Extra Video: How Secure is 256- 2abca311ef280a1579bec7e77cf54ce7c9253600d664c84192eb6 bit Security (this is an amazing 068a9906a7c and simple explanation of the challenging math) 82 83 If a person wants to do something “bad”, (tamper with a block, add a fake one, etc.) anything they do will change the hash… Which means all hashes for the subsequent blocks will be wrong And they will be caught! Or they could recalculate ALL the hashes! Gooooood luck! 84 85 Professor J. Siambanopoulos 18 King's University College Fall 2024 MOS 1033A BLOCKCHAIN: DISTRIBUTED COMPUTING Blockchain advantages Immutable ________________________________________: an object whose state cannot be modified after it is created. This is important! Digital Trust: confidence users have in the ability of people, technology and processes to create a secure digital world. (P2P, encryption, hashes, consensus requirements) Internet of Things Integration: Mix of new IoT devices, IoT data, IoT platforms and IoT applications — combined with IT assets to create a solution for a problem. One neat application of blockchains: Smart Contracts Watch video: 4 Blockchains How Can They Be Used 86 IOT of 88 = Internet thing Other future applications? 89 Professor J. Siambanopoulos 19