Data Deluge & Analytics PDF
Document Details
Uploaded by ExceptionalConnemara8104
Cairo University Engineering
Tags
Summary
This document explains concepts related to data deluge, analytics, and data science. It explores different aspects of big data, such as volume, velocity, variety, and variability. It touches upon the role of citizen data scientists. It also provides information about advanced analytics frameworks and methods.
Full Transcript
10/17/2024 Data Deluge 31 Consequences of the Data Deluge Every problem generates data eventually. Every company needs analytics eventually. Everyone needs analytics eventually. 32...
10/17/2024 Data Deluge 31 Consequences of the Data Deluge Every problem generates data eventually. Every company needs analytics eventually. Everyone needs analytics eventually. 32 1 10/17/2024 Levels of Analytics - Ad hoc = one-time reports, specific for a topic - Drill down = start from broad view then get to details 33 Data Science 34 2 10/17/2024 Reasons for the Big Data Explosion Increasing “data velocity” due to the following: streaming data feeds point-of-sale (POS) transactional systems radio-frequency identification (RFID) tags smart metering bigger and cheaper data storage capabilities social media improved and automated business processes mergers and acquisitions, leading to the merge of multiple data sources more online self-service applications being used 35 Factors Driving Demand for Big Data Solutions In addition to rapidly increasing data growth rates, consider these factors: availability of data from social media sources demand for mobile business intelligence increasing requirements around real-time reporting desire to mine data from social media sources (sentiment analysis) 36 3 10/17/2024 Big Data Big data is what happened when the cost of storing information became less than the cost of making the decision to throw it away 37 Big Data: What Is It? The point at which the volume, velocity, and variety of data exceed an organization’s storage or computation capacity for accurate and timely decision making 38 4 10/17/2024 Factors associated with big data: data volume data velocity data variety data variability data complexity 39 Data Volume Data volumes are increasing due to use of the following: social media (Facebook, Twitter, Instagram) machines talking to machines improvements in the manufacturing process (quality control) automated tracking devices streaming data feeds 40 5 10/17/2024 Data Velocity business processes that are more automated mergers and acquisitions more use of social media more use of self-service applications integration of business applications 41 Data Variety structured data unstructured data business applications unstructured text documents (articles, blogs, and so on) emails digital images video and audio clips streaming data stock ticker data RFID tag data sensor data 42 6 10/17/2024 Data Variability The flow of data changes over time (seasonality, peak response, social media trends, and so on). Data values change over time. How much history do you keep? Data values are different across data sources. Data is stored in different formats. Data standards change across time. What was “valid” five years ago might not be “valid” today. 43 Data Complexity Data comes from a variety of systems in a variety of formats. This can make it difficult to merge, cleanse, and transform data in a uniform manner. 44 7 10/17/2024 The Citizen Data Scientist 45 What Is a Data Scientist? Data scientists are a new breed of analytical data expert who have the technical skills to solve complex problems and the curiosity to explore what problems need to be solved. They are part mathematician, part computer scientist, part trend spotter. They are a sign of the times. Their popularity reflects how businesses now think about big data. 46 8 10/17/2024 Typical Job Duties for a Data Scientist It is not definitive, but think of Collecting large amounts of unruly data and transforming it into a more usable format Solving business-related problems using data-driven techniques Working with a variety of programming languages, including SAS, R and Python Having a solid grasp of statistics, including statistical tests and distributions Staying on top of analytical techniques such as machine learning, deep learning, and text analytics Communicating and collaborating with both IT and business Looking for order and patterns in data, as well as spotting trends that can help a business’s bottom line 47 Typical Job Responsibilities for a Data Scientist collect large amounts of unruly data and transform it into a more usable format solve business-related problems using data-driven techniques work with a variety of programming languages (for example, SAS, R, and Python) have a solid grasp of statistics, such as statistical tests and distributions stay on top of analytical techniques such as social network analysis, text analytics, and new methodologies for predictive modeling communicate and collaborate with both IT and business look for order and patterns in data 48 9 10/17/2024 But … There just are not enough data scientists in the workforce. it is important to realize one data scientist might not have all the necessary skills. it is important to develop a team of data scientists that are “scattered across the business.” There is a rise of easier-to-use analytics tools. Analytics is so important to society that it cannot be something that is only the domain of experts. so we need Citizen Data Scientists 49 How to Find Citizen Data Scientists? 50 10 10/17/2024 Characteristics of Citizen Data Scientists tired of looking at the same reports want to get their hands on all the data themselves and find new ways to get answers willing to learn new methods and use new tools analytically minded 51 52 11 10/17/2024 Data Scientist Skills Mathematics and Statistics Communication and Visualization Computer Domain Science Knowledge 53 Applied Data Science Anomaly Detection Customer Transaction Behavior Money Laundering Retail Banking Supply Chain Segmentation Forecasting Risk Analysis Utilities Insurance Government Spending Optimization Loss Estimation Collecting Prediction Churn Cross-Sell/Upsell Fraud Detection Bad Debt Prediction 54 12 10/17/2024 Data Science Process What is the goal? Are there anomalies or patterns? What did you learn? Do you need to classify, estimate, How the data look? Can you explain the answer with the describe? Do you have too many or too few variables? model? Do you have the proper data? Do you need to impute/transform the data? Can you tell a story? What actions are planned? Do you need to aggregate/create the data? Can you deploy the model in time? A question Collect the data Explore the data Model the data An answer Which data are relevant? Train different models (algorithms and approaches). How many sources are involved? Validate all the models. Do you have access to the data? Test all the models. Do you have privacy issues? Select the best model according to the question/goal. Will the data be available in production? 55 Score the champion model. C o p y r i ght © S A S I n s titu te In c. A ll r igh ts r e s e r v e d. 55 Advanced Analytics Framework Focused on data mining and optimization tasks Business Value Optimization - OLAP is interactive and lets user process and “What of offer?” report the data Data Mining “What will happen?” OLAP “Why did it happen?” Query and Report “What happened?” 56 Complexity C o p y r i ght © S A S I n s titu te In c. A ll r igh ts r e s e r v e d. 56 13 10/17/2024 Traditional Analytics at Rest Data Data Storage ETL Deploy Alerts - Reports Decisioning 57 Streaming Analytics Stream – Understand – Act Data Data Storage ETL Deploy Alerts - Reports Decisioning Deploy Enrich Store Streaming Data Streaming Model Execution 58 14 10/17/2024 Analytical Methods and Applications 59 Advanced Analytics Advanced analytics comprises a set of different techniques used to solve problems: machine learning statistical analysis forecast text analytics optimization Machine Learning Statistical Analysis Forecast Text Analytics Optimization 60 15