MSCI 2130 Introduction to MIS - Big Data & Analytics Lecture Notes PDF
Document Details
![YouthfulJadeite3332](https://quizgecko.com/images/avatars/avatar-15.webp)
Uploaded by YouthfulJadeite3332
null
2025
Tags
Summary
These lecture notes cover the introduction to big data and analytics. They explain the characteristics of big data (volume, variety, velocity), and discuss processing methods like MapReduce and Hadoop. The lecture also introduces business analytics and tools such as Google Analytics and Tableau.
Full Transcript
MSCI 2130: Introduction to MIS Lecture 1- Week 5 1 Lab 4: Intro to HTML 6 Midterm Exam (February 11th) Big Data and Analytics Lab 3; Data Visualization Dat...
MSCI 2130: Introduction to MIS Lecture 1- Week 5 1 Lab 4: Intro to HTML 6 Midterm Exam (February 11th) Big Data and Analytics Lab 3; Data Visualization Data & Knowledge Mgt (2) Lab 2: Excel Lab 1: MS Access Macro Data & Knowledge Wireless Tech and Mgt Cloud Computing Data & Knowledge Mgt Introduction to IS Hardware, Software, & Telecommunications 2 Big Data 3 What is Big Data? (41) Big Data In 5 Minutes | What Is Big Data?| Introduction To Big Data |Big Data Explained | 4 Simplilearn – YouTube What is Big Data? (3Vs) Generated in huge volumes Exhibit variety including structured, unstructured, and semi-structured data Are generated at high velocity with an uncertain pattern Do not fit neatly into traditional, structured, relational databases Can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems 5 Characteristics of Big Data Volume Variety Velocity (Size) (Complexity) (Speed) 6 Characteristics of Big Data (Volume) The amount of data generated every second is increasing exponentially https://www.weforum.org/agenda/2019/04/how-much-data-is-generated-each-da y-cf4bddf29f/ 7 Text Analytics https://lawtomated.com/structured-data-vs-unstructured-data-what-are-they-and- 9 why-care/ Characteristics of Big Data (Variety) Data generated in a variety of formats Structured, semi-structured, & unstructured E.g., videos, audios, pictures…etc Variety (Complexity) 10 Characteristics of Big Data (Velocity) It refers to the speed at which the data is generated, stored, processed and analyzed to take real-time actions Velocity (Speed) 11 Why We Need Big Data A primary goal of big data is the ability to analyze and blend the new data types together to find new patterns, trends, and insights that give the opportunity to expand and grow your business. Uncover Real-time Hidden Actions Patterns Competitive Advantage Data-Driven Firm Decisions Agility 12 Big Data Processing Parallel processing or parallel computing or distributed computing It is a method to improve computer system performance by executing two or more instructions simultaneously The idea is to divide large tasks into smaller ones that can be processed concurrently The goal is: To reduce the amount of time needed to process a certain task To solve problems of bigger size that might not fit in the limited memory of a single CPU 13 MapReduce A framework that helps software programs do parallel data processing allowing the power of thousands of computers to work in parallel The map task takes input data and converts it into a dataset that can be computed in key-value pairs. The output of the map task is consumed by reduce tasks to aggregate output and provide the desired result. Since traditional DBMS do not process such types of data, new systems came into place such as: Hadoop Spark Etc.. 14 Hadoop Open-source program supported by Apache Foundation Manages thousands of computers through Hadoop Distributed File System (HDFS) to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly. Amazon supports Hadoop Hadoop includes the query language named Pig 15 Hadoop Distributed File System (HDFS) Split data and store 3 replica on commodity servers 16 MapReduce Example Assume we have a program that wants to do “Word Count” The input of this program is a volume of raw text, of unspecified size (could be KB, MB, TB, it doesn’t matter!) The output is a list of words, and their occurrence count. Assume that words are split correctly, ignoring capitalization and punctuation. Example: “The doctor went to the store. => The, 2 Doctor, 1 Went, 1 To, 1 17 Store, 1 Example: Word Count http://kickstarthadoop.blogspot.ca/2011/04/word-count-hadoop-map-reduce- example.html 18 19 Introduction to Management Information Systems Chapter 12 Business Analytics ©2022 John Wiley & Sons, Inc. All rights reserved. What is Business Analytics (BA)? BA is the process of developing actionable decisions based on insights generated from data BA examines data with statistical tools, formulates descriptive, predictive, and prescriptive models, and communicates the results to decision makers Why managers need analytics and IT help to make decision? 20 Why Managers Need IT Support Decision making is difficult due to the following trends: o Number of alternatives is constantly increasing o Most decisions must be made under time pressure o Increased uncertainty in the decision environment o Often necessary to rapidly access remote information, consult with experts, or conduct a group decision-making session Analytics helps managers to focus on decisional roles 21 Decision Making Process Intelligence phase: What is the problem? Design phase: What are the options? Choice phase: Pick an option and decide how to implement it 22 The Business Analytics Process 23 Descriptive Analytics Descriptive analytics is probably the first phase of utilizing analytics and mainly relies on data warehouses Data warehouse A repository of historical data that are organized by subject to support decision makers in the organization Unlike the two-dimensional relational database tables, data warehouses store data in a multi-dimensional format (e.g., data cubes) 24 Data Cubes in a Data Warehouse 26 Remember Relational Databases? 27 Equivalence between relational and multidimensional databases. 28 The Business Analytics Process 29 Descriptive Analytics It describes what happened in the past to help decision makers learn from past behaviors Many BA Tools could be used in Descriptive Analytics: o Online Analytical Processing (OLAP) (aka multidimensional analysis) o Decision Support Systems o Data mining o Others… 30 Online Analytical Processing Online Analytical Processing (OLAP) or multidimensional analysis It uses the “Data Cubes” of the data warehouses and attempts to do some analysis on them. For example, slicing and dicing the data cube to look at certain data with certain attributes 31 32 Some BA Tools There are many software tools that could be used for descriptive analytics and reporting such as: Excel (e.g., pivot tables) Google analytics Tableau Power BI etc 37 Google Analytics A free web analytics service launched in 2005 by Google and is used by approximately 55 percent of all websites (e.g., Twitter, General Electric, The Four Seasons, The Financial Times…etc) Along with many other services, it tracks and reports website traffic and provides statistics of website visitors’/users, their navigation behavior, and transactions taking place For example, it can track what online behavior led to purchases and then we can use that data to make informed decisions about how to improve the website to reach new and existing customers. How..? 38 Google Analytics (cont’d) To track a website, you first have to create a Google Analytics account. Then you need to add a small piece of Javascript tracking code to each page on your site. Every time a user visits a web page, the tracking code will collect anonymous information about how that user interacted with the page. The tracking code could show how many users visited a page or how many users bought an item by tracking whether they made it to the purchase confirmation page. 39 Google Analytics (cont’d) 40 Google Analytics (cont’d) The tracking code will also collect information from the browser like: Language: the browser is set to. Type of Browser: like chrome, explorer, etc. Device Operating System Traffic Source: what brought users to the site in the first place. This might be a search engine, an advertisement they clicked on, or an email marketing campaign. 41 Google Analytics (cont’d) 42 Tableau An interactive data visualization software company It allows anyone to easily connect to data, visualize, and create interactive, sharable dashboards. So easy to learn and powerful enough to help with the most complex analytical problems 43 Tableau An interactive data visualization software company It allows anyone to easily connect to data, visualize, and create interactive, sharable dashboards. So easy to learn and powerful enough to help with the most complex analytical problems 44 45 46 47 48 Tableau (cont’d) 49 Tableau AI ( Einstein) 50 Power BI Source: What is Power BI? - YouTube 51 Power BI (cont’d) 52 Power BI (cont’d) Source: Browse all learning paths and modules - Training | Microsoft L earn 53 The Business Analytics Process 54 Predictive Analytics Examines recent and historical data to detect patterns and predict future outcomes or trends Predictive analytics can only forecast what might happen in the future, based on probabilities Examples: predicting credit card fraud, sales, illness, targeted marketing…etc BA tools for predictive analytics include: Data mining Regression analysis 55 Prescriptive Analytics Recommends one or more courses of action and show the likely outcome of each decision It is more focused on finding the optimal decision or solution BA for prescriptive analytics include: Optimization Simulation tools Decision trees 56 Prescriptive Analytics Example Waymo Driverless Car. During every trip, the car makes multiple decisions/recommendations about what to do based on predictions of future outcomes. For example, when approaching an intersection, the car must determine whether to go left, right, or straight ahead. Based on its destination, it makes a decision. It attempts to also make a decision that is optimal given the different parameters it considers. Additionally, the car must anticipate what might be coming in regard to vehicular traffic, pedestrians, bicyclists, and so on. The car must also analyze the impact of a possible decision before it actually makes that decision. 57 Presentation Tools Organizations use presentation tools to help managers visualize the results of analyses to users in visual formats such as charts, graphs, figures, and tables. This process, makes the results more attractive and easier to understand. As discussed earlier, a variety of visualization methods and software packages that support decision making are available. Dashboards are the most common BA presentation tool 58 Dashboards Dashboards are an excellent way to provide high-level information. Similar to a dashboard in a vehicle, include the most important information in a dashboard. Then managers can go to the report for more details. It is user-friendly and supported by graphics, and, most importantly, it enables managers to examine exception reports and drill down into detailed data 59 A Sample Dashboard 60 AI Today! Did you know! Chinese AI startup DeepSeek just released Janus-Pro, a new open- source multimodal AI model that outperforms major image generation rivals like DALL-E 3 and StabIe Diffusion — coming on the heels of the company’s viral R1 launch. 61 62