Introduction to Data Science PDF
Document Details
Uploaded by Deleted User
Rome Business School
Tags
Summary
This presentation from ROME BUSINESS SCHOOL provides an introduction to data science, including explanations of structured and unstructured data, data warehouses, and business intelligence. It also explores big data and the role of data scientists in organizations.
Full Transcript
Better Managers for a Better World Introduction to Data Science Module: Data Analytics Master: Managerial Core Competences Date of lecture: Oct 2024 romebusinessschool.com Disclaimer All product and company names are trademarks or regi...
Better Managers for a Better World Introduction to Data Science Module: Data Analytics Master: Managerial Core Competences Date of lecture: Oct 2024 romebusinessschool.com Disclaimer All product and company names are trademarks or registered® trademarks of their respective holders. The use of registered names, logos, brands, etc. does not imply any affiliation with or endorsement by them. romebusinessschool.com Better Managers for a Better World First 3 skills: Making decisions in complex and changing environments Managing business projects involving Big Data and enterprise-level technologies developing data-driven business strategies for growth and development romebusinessschool.com Better Managers for a Better World 3 Background Agenda 1. Introduction to DATA 2. Data Warehouse 3. Business Intelligence 4. Big Data romebusinessschool.com Better Managers for a Better World 1 - Background - Data romebusinessschool.com Better Managers for a Better World What is Data? Basically, any kind of information that can be digitized and stored electronically is considered data. romebusinessschool.com Better Managers for a Better World 6 Structured vs Unstructured Unstructured Data: Unorganized format: Unstructured data lacks a predefined format and can exist in various forms like text documents, emails, social media posts, images, audio, and video. Difficult to analyze directly: Due to its varied nature, traditional data analysis tools often struggle to process unstructured data directly. Examples: Social media feeds, customer reviews, emails, sensor data, medical images, video recordings. romebusinessschool.com Better Managers for a Better World 7 Structured vs Unstructured Structured Data: Highly organized: Structured data follows a predefined format, often stored in relational databases or spreadsheets. Easy to search and analyze: Because of its consistent format, structured data can be easily searched, sorted, and analyzed using traditional database tools and queries. Examples: Customer information (names, addresses, phone numbers), financial transactions (amounts, dates, categories), product catalogs (descriptions, prices, stock levels). romebusinessschool.com Better Managers for a Better World 8 Structured vs Unstructured Feature Structured Unstructured Predefined, Unorganized, Format organized varied formats Customer Emails, social Examples databases, media posts, spreadsheets images Easy to search Requires Analysis and analyze specialized tools romebusinessschool.com Better Managers for a Better World 9 2 - Background – Data Warehouse romebusinessschool.com Better Managers for a Better World What is a Data Warehouse? A Data Warehouse (DWH) acts as a central hub for historical data from various sources within an organization. While similar to a traditional data archive, it prioritizes structured organization and integration for in-depth analysis, not just storage. https://vitolavecchia.altervista.org/differenza-vantaggi-e-svantaggi-tra-data-warehouse-e-database-unico-centralizzato/ romebusinessschool.com Better Managers for a Better World 11 What is a Datawarehouse? Focus on historical data: Data warehouses primarily store historical data, allowing you to analyze trends and patterns over time. While some real-time data might be included, it's not the main focus. Subject-oriented: Data is organized by subject area, such as sales, marketing, or customer service. This makes it easier for analysts to find the specific data they need for their analysis. Integrated data: Data from multiple sources is transformed and integrated into a consistent format. This eliminates inconsistencies and allows for easier analysis across different departments. Read-only access: Data warehouses are typically read-only, meaning data is primarily used for analysis, not for updating day-to-day operations. Benefits: Data warehouses provide several benefits for businesses, including improved decision-making, better customer understanding, and enhanced reporting capabilities. romebusinessschool.com Better Managers for a Better World 12 3 - Background – Business Intelligence (BI) romebusinessschool.com Better Managers for a Better World What is BI? Business intelligence (BI) is a broad term that includes the strategies, technologies, and practices used by companies to gather, analyze, and interpret data to gain valuable insights that inform business decisions. It's essentially about transforming raw data into actionable knowledge. romebusinessschool.com Better Managers for a Better World 14 What is BI? Data-driven decision making: BI empowers businesses to move beyond intuition and make decisions based on concrete evidence and insights derived from data analysis. Focus on business needs: BI goes beyond just collecting data; it focuses on collecting and analyzing data relevant to specific business goals and objectives. Variety of tools and techniques: BI utilizes a range of tools and techniques like data visualization, reporting, data warehousing, and analytics to extract meaning from data. Improved performance: By leveraging data insights, BI can help businesses improve performance in various areas like sales, marketing, customer service, and operational efficiency. romebusinessschool.com Better Managers for a Better World 15 4 - Background – Big Data romebusinessschool.com Better Managers for a Better World What is BIG Data? Big data refers to massive and complex datasets that are difficult to store, process, and analyze using traditional methods. It's not just about the sheer volume of data, but also the variety (structured, unstructured, and semi-structured) and the velocity at which it's generated. Think social media feeds, sensor data, financial transactions, and scientific research – all contributing to this ever-growing data deluge. https://www.itismagazine.it/contenuti-speciali/il-presente-e-il-futuro-delle-tracce-digitali-e-dei-big-data/ romebusinessschool.com Better Managers for a Better World 17 What is BIG Data? Volume: The amount of data is staggering, measured in terabytes, petabytes, and even exabytes. Imagine a library with more information than you could ever read in a lifetime! Variety: Big data comes in all shapes and sizes, from text and numbers to images, videos, and audio recordings. It's like having a library with books, pictures, movies, and even sound recordings all mixed together. Velocity: Data is generated at an incredible pace, constantly flowing from various sources. It's like a river of information rushing by, requiring new techniques to capture and analyze it effectively. romebusinessschool.com Better Managers for a Better World 18 Better Managers for a Better World Thank you Via Giuseppe Montanelli, 5 00195, Roma RM romebusinessschool.com Agenda 1. Introduction to Data Science 2. Big Data 3. Technologies for Data Science 4. The Data Scientist 5. Case studies & real examples romebusinessschool.com Better Managers for a Better World 1. Introduction to Data Science romebusinessschool.com Better Managers for a Better World What is Data Science? A multidisciplinary approach, that combines mathematics & statistics, computer science, and domain knowledge to find, extract, and surface patterns in data https://towardsdatascience.com/introduction-to-statistics-e9d72d818745 romebusinessschool.com Better Managers for a Better World 22 What is Data Science? A multidisciplinary approach, that combines mathematics & statistics, computer science, and domain knowledge to find, extract, and surface patterns in data to extract or extrapolate knowledge and insights from data https://towardsdatascience.com/introduction-to-statistics-e9d72d818745 romebusinessschool.com Better Managers for a Better World 23 What is Data Science? Study of data to extract meaningful insights for business https://towardsdatascience.com/introduction-to-statistics-e9d72d818745 romebusinessschool.com Better Managers for a Better World 24 Data is the new oil! Questa foto di Autore sconosciuto è concesso in licenza da CC BY romebusinessschool.com Better Managers for a Better World 25 Data is the new oil! https://data-flair.training/blogs/data-science-applications/ romebusinessschool.com Better Managers for a Better World 26 The value of the Data Information romebusinessschool.com Better Managers for a Better World 27 Market Value romebusinessschool.com Better Managers for a Better World 28 Industry trends Artificial Intelligence (AI) Internet of Things (IoT) AI and IoT (AIoT) rely upon Advanced Big Data Analytics Real-time data is a key for all use cases, segments, and solutions Market leading companies are rapidly integrating Big Data technologies with IoT infrastructure https://www.businesswire.com/news/home/20230307005656/en/Global-Big-Data-Markets-Report-2023- 2028-Market-Leading-Companies-are-Rapidly-Integrating-Big-Data-Technologies-with-IoT-Infrastructure romebusinessschool.com Better Managers for a Better World 29 2. Big Data romebusinessschool.com Better Managers for a Better World Volume of data/information created, captured, copied, Big Data and consumed worldwide (in Zettabyte = 1,000 Exabyte) Each day there are generated 2020 2.5 Exabyte of data 2023 300+ Exabyte of data https://financesonline.com/how-much-data-is-created-every-day https://ussignal.com/blog/celebrate-world-backup-day-with-dr-planning https://www.statista.com/statistics/871513/worldwide-data-created romebusinessschool.com Better Managers for a Better World 31 Big Data https://statanalytica.com/blog/4-vs-of-of-big-data/ romebusinessschool.com Better Managers for a Better World 32 Big Data Structured Data Unstructured Data romebusinessschool.com Better Managers for a Better World 33 Big Data Structured Data (20%) Unstructured Data (80%) romebusinessschool.com Better Managers for a Better World 34 The Data Warehouse Only structured data is stored Data is pre-processed, organized and modeled High performance, low flexibility https://www.databricks.com/glossary/data-lakehouse romebusinessschool.com Better Managers for a Better World 35 The Data Lake Data is stored in native raw format No need to model, organize or clean the data Data are organized only when requeted for processing https://www.databricks.com/glossary/data-lakehouse romebusinessschool.com Better Managers for a Better World 36 The Data Lakehouse Speed Flexibility Scalability https://www.databricks.com/glossary/data-lakehouse romebusinessschool.com Better Managers for a Better World 37 Domain Data-as-a-Product ownership Data Mesh Self-Service Federated Data Platform Governance https://www.datamesh-architecture.com romebusinessschool.com Better Managers for a Better World 38 3. Technologies for Data Science romebusinessschool.com Better Managers for a Better World Data Analysis IT Infrastructure & network: on premise vs. Cloud Software and applications: frameworks, tools, toolkits, libreries romebusinessschool.com Better Managers for a Better World 40 Data Science Cloud Services https://cloud.google.com/data-science romebusinessschool.com Better Managers for a Better World 41 Data Science Cloud Services https://docs.aws.amazon.com/whitepapers/latest/data-warehousing-on-aws/analytics-pipeline-with-aws-services.html romebusinessschool.com Better Managers for a Better World 42 Data Science Cloud Services https://devblogs.microsoft.com/azuregov/data-science-virtual-machines-are-now-available-on-azure-government-cloud/ romebusinessschool.com Better Managers for a Better World 43 Data Visualization (Data Viz) Graphs Reports Business Intelligence Advanced Analytics romebusinessschool.com Better Managers for a Better World 44 Analytics https://medium.com/co-learning-lounge/types-of-data-analytics-descriptive-diagnostic-predictive-prescriptive-922654ce8f8f romebusinessschool.com Better Managers for a Better World 45 Analytics https://medium.com/co-learning-lounge/types-of-data-analytics-descriptive-diagnostic-predictive-prescriptive-922654ce8f8f romebusinessschool.com Better Managers for a Better World 46 Analytics https://medium.com/co-learning-lounge/types-of-data-analytics-descriptive-diagnostic-predictive-prescriptive-922654ce8f8f romebusinessschool.com Better Managers for a Better World 47 Analytics https://medium.com/co-learning-lounge/types-of-data-analytics-descriptive-diagnostic-predictive-prescriptive-922654ce8f8f romebusinessschool.com Better Managers for a Better World 48 From Data to Knowledge romebusinessschool.com Better Managers for a Better World 49 From Data to Knowledge romebusinessschool.com Better Managers for a Better World 50 From Data to Knowledge romebusinessschool.com Better Managers for a Better World 51 From Data to Knowledge romebusinessschool.com Better Managers for a Better World 52 From Data to Knowledge romebusinessschool.com Better Managers for a Better World 53 From Data to Knowledge, to Wisdom https://www.evalueserve.com/blog/making-the-leap-from-insights-to-wisdom-a-collaborative-approach romebusinessschool.com Better Managers for a Better World 54 4. The Data Scientist romebusinessschool.com Better Managers for a Better World Using Data Analysis as a Superpower https://hbsp.harvard.edu/products/videos/5331AV-AVO-ENG romebusinessschool.com Better Managers for a Better World 56 Data Scientist Not only tech skills, but also, soft skills Domain knowledge Effective communication Questa foto di Autore sconosciuto è concesso in licenza da CC BY-SA romebusinessschool.com Better Managers for a Better World 57 Data Science Team Example Principal DS / Analyst Developer Main Skills Research Coding Architect/Data Engineer BI Visualization Statistics Data Engineering Business Domain Front End developer Data Scientist romebusinessschool.com Better Managers for a Better World 58 romebusinessschool.com Better Managers for a Better World 59 5. Case studies & real examples romebusinessschool.com Better Managers for a Better World Data Driven Business Case – steps Problem Definition DATA Analysis Implementation Business Impact Defining business problem Define the needed data Choose the correct Define the solution Identify the expected or opportunities Define the origin of the analytcal technique for architecture outcomes and benefits of How this problem align needed data solving the problem Define how to monitor the implementing the solution with th organization’s Do we have access to Identify how validate and performance of the overall Define how to measure overall objectives necessary data? evaluate the performance solution the ROI Check the quality an the of the models Define the strategy for accuracy of the data we Define the impact of the contious improvement have insights to the business Ensure that the insights are actionale and valuable to the business romebusinessschool.com Better Managers for a Better World 61 Churn Rate Churn rate (sometimes called attrition rate) is a measure of the proportion of individuals or items moving out of a group over a specific period. It is one of two primary factors that determine the steady-state level of customers a business will support 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑢𝑠𝑡𝑜𝑚𝑒𝑟𝑠𝐿𝑜𝑠𝑡 𝐶ℎ𝑢𝑟𝑛 𝑅𝑎𝑡𝑒 % = 𝑥 100 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑢𝑠𝑡𝑜𝑚𝑒𝑟𝑠 𝑎𝑡 𝑆𝑡𝑎𝑟𝑡 https://en.wikipedia.org/wiki/Churn_rate romebusinessschool.com Better Managers for a Better World 62 Telco’s Customer Churn Business Case Problem DATA Analysis Implementation Business Impact Definition Customer data Prediction models Data Reduced Churn Telco customer Network Data like LR, Random Preprocessing Rate churn Competitor Data Forest gradient Feature Improve customer Boosting machines engineering stisfaction Model training ROI – cost saving Model ROI increased deployment revenue romebusinessschool.com Better Managers for a Better World 63 Disney - Personalized Guest Experience Problem DATA Analysis Implementation Business Impact Definition Guest data Recommandation Data Lake Increased Guest Personalized Guest Park Operations Systems Data Satisfaction Experience Data Clustering Preprocessing and Enhanced Revenue External Data Predictive Feature Guset satistaction Sentiment Analysis engineering surveys Model building ROI – cost saving API integration ROI increased revenue https://www.linkedin.com/pulse/disney-uses-big-data-iot-machine-learning-boost-customer-stedman/ romebusinessschool.com Better Managers for a Better World 64 Uber: Optimizing Ridesharing Experiences Problem DATA Analysis Implementation Business Impact Definition Rider data Dynamic Pricing Data pipeline and Reduce Rider wait Optimizing Driver Data Route Data Lake time Ridesharing External Data Optimiaztion Data Modeling Increased Driver Experiences Demand API integration Earning Prediction Real-time Customer Matching monitoring satistaction Algorithms ROI https://www.projectpro.io/article/how-uber-uses-data-science-to-reinvent-transportation/290 romebusinessschool.com Better Managers for a Better World 65 Conclusions romebusinessschool.com Better Managers for a Better World Conclusions - recap 1. Introduction to Data Science 2. Big Data 3. Technologies for Data Science 4. The Data Scientist 5. Case studies & real examples romebusinessschool.com Better Managers for a Better World 67 Better Managers for a Better World Thank you Via Giuseppe Montanelli, 5 00195, Roma RM romebusinessschool.com No part of this video or any of its contents may be reproduced, copied, modified of adapted, without the prior written consent of the author, unless otherwise indicated for stand-alone materials. Copyright Rome Business School All rights reserved romebusinessschool.com Better Managers for a Better World