Data Analytics TRAINING MODULE 1 PDF
Document Details
Uploaded by DelightedLesNabis4809
GL Bajaj Institute of Technology and Management
Tags
Summary
This document introduces the concept of data analytics, explaining its importance and applications. It outlines different types of data analysis, including descriptive, predictive, and prescriptive methods. Data analytics is presented as a way to gain valuable insights and make intelligent decisions from data.
Full Transcript
# Data Analytics ## What is Data Analytics? ### Introduction - A diagram shows **Data Mining** in the center of a Venn Diagram, overlapping with **Machine Learning** and **Artificial Intelligence**. - **Statistics** is the third part of the Venn Diagram, overlapping only with **Data Mining**. -...
# Data Analytics ## What is Data Analytics? ### Introduction - A diagram shows **Data Mining** in the center of a Venn Diagram, overlapping with **Machine Learning** and **Artificial Intelligence**. - **Statistics** is the third part of the Venn Diagram, overlapping only with **Data Mining**. - An image of a human head with a blurred background of code and data is shown near the Venn Diagram. ### Why Data Analytics? - Several logos are presented in a row representing popular businesses that use data analytics to their advantage. - The logos represent **Facebook**, **Amazon**, **Uber**, **Bank of America**, **Airbnb**, and **Spotify**. - The logos are accompanied by a phrase describing how these platforms use data analytics. - **Facebook:** Social Analytics - **Amazon:** Improving E-commerce experience - **Uber:** Optimizing Rides - **Bank of America:** Increasing Customer experience - **Airbnb:** Improving Searches - **Spotify:** Music Recommendation ### Data - Data is defined as a set of values of qualitative or quantitative variables. - It is information in raw or unorganized form. - Data can be a fact, figure, characters, or symbols. - Information is defined as organized or meaningful data. - Information comes from analyzing data. ### What is Data Analytics? - Analytics is the discovery, interpretation, and communication of meaningful patterns in data. - Data Analytics (DA) is the process of examining data sets to draw conclusions about the information. - Analytics is not a tool or technology, rather it is the way of thinking and acting on data. - A caption below a diagram of hands holding a graph states: "Data on its own is useless unless you can make sense of it!" ### Primary Focus Areas for Analytics - A diagram shows a two by two quadrants of a grid. - The quadrants represent focus areas within data analytics. - The X-axis represents **Understand Customers Behavior** on the left and **Understand Products Usage** on the right. - The Y-axis represents **Increase Operational Efficiency** on the bottom and **Business Models Innovation** on the top. ### Data Analytics - Example - **Business Analytics:** - The business environment today is more complex than ever before. - Businesses are expected to be diligently responsive to the increasing demands of customers, various stakeholders, and even regulators. - The primary objective of an organization that seeks to turn to analytics is: - Revenue/Profit growth - Optimize expenditure - Organizations are turning to analytics to enhance their competitiveness. - More than 83% of Global CIOs surveyed by IBM in 2010 saw Business Intelligence and Analytics as a visionary plan to enhance their competitiveness. - **Risk** - **Fraud** - **Health** - **Web** ### The Process of Statistical Analysis - A diagram shows a process flow of three steps. - **Form Hypotheses** - **Identify Data Source** - **Prove/Disprove Hypothesis** - Statistical analysis enables us to make quantitative inferences based on an amount of information we can analyze (a sample). ### Data Analytics Life Cycle - A diagram shows a circular diagram of steps within the Data Analytics Life Cycle. - **Data Visualization** - **Data Analytics** - **Objective** - **Understanding the data** - **Data cleaning & data transformation** - **Data Enhancement** ### Types of Data Analytics - **Descriptive Analytics** - **Predictive Analytics** - **Prescriptive Analytics** ## Types of Data Analysis - A triangle diagram shows **Prescriptive Analytics** at the peak. **Predictive Analytics** and **Descriptive Analytics** are at the bottom corners. - **Prescriptive Analytics** - Enabling smart decisions based on data - What should we do? - **Predictive Analytics** - Predicting the future based in historical patterns - What could happen? - **Descriptive Analytics** - Mining data to provide business insights - What has happened? ## Types of Data Analysis - **Descriptive Analytics** - Aims to uncover valuable insights from data being analyzed. - Answers the question "What has happened?" - **Predictive Analytics** - Helps forecast behavior of people and markets. - Answers the question "What could happen?" - **Prescriptive Analytics** - Suggests conclusions or actions to be taken based on analysis. - Answers the question "What should be done?" - What should we do? - A set of images shows examples of each type of data analytics. - A picture shows Netflix recommending movies based on personal preferences. - A picture shows a grocery cashier handing out coupons. - A picture shows an airplane ticket with different prices. - Descriptive Analytics: Insight into the past. - Predictive Analytics: Understanding the future. - Prescriptive Analytics: Advice on possible outcomes. ## Descriptive Data Analytics - Although it is the simplest type of data analysis, it is the most commonly used. - **Types of Descriptive Analysis:** - **Measures of Central Tendency:** Tells us about the middle - Mean: The average - Median: The midpoint of the responses - Mode: The response with the highest frequency - **Measures of Dispersion:** - Range: The min, the max, and the distance between the two. - Variance: The average degree to which each of the points differ from the mean. - Standard Deviation: The most common way of expressing the spread of data - A histogram shows the **Mean**, **Median**, and **Mode** of the amount of items purchased. - A table shows data for **Customer ID**, **Items Purchased**, and **Amount Spent**. ## Variability - A diagram shows two groups of money bags. - The first group shows all money bags of equal size with a caption stating **No Variability in Cash Flow**. - The second group shows money bags with different sizes with a caption stating **Variability in Cash Flow**. - Both groups show a single money bag at the end representing the **Mean**. ## Standard Deviations - An image shows the **Standard Deviation** of a bell curve. - 68% within 1 Standard Deviation - 95% within 2 Standard Deviations - 99.7% within 3 Standard Deviations ## Predictive Data Analytics - Some mistake predictive analysis to only have relevance to predicting future events. - However, in cases such as sentiment analysis, existing data (e.g., the text of a tweet) is used to predict non-existent data (whether the tweet is positive or negative). - Several of the models that can be used for predictive analysis are: - Forecasting - Simulation - Regression - Classification - Clustering ## Prescriptive Data Analytics - Decisions can be formulated from descriptive and predictive analysis. - If I need to cut a product and I know that product C is least preferred and least profitable, I will cut product C. - Prescriptive Analytics explicitly tells you the decisions you should make. - This can be done using various techniques: - Linear Programming - Integer Programming - Mixed Integer Programming - Nonlinear Programming ## Comparing The Three Types of Data Analytics - Descriptive analysis is the most common. - A good practice is performing descriptive analyses prior to prescriptive/predictive analysis. - Understand that distribution, variance, skew, etc., may exclude certain models. - **How to know which type of analysis to pursue:** - How much time do you have? - What resources are available to you? - How accurate is your data? - How accurate do you need the model/analysis to be? - How popular/accepted is the model you are considering? - Don't subscribe to "that's how we've always done it," but remember to use a model that stakeholders will accept. - A diagram shows a graph with **Difficulty** on the X-axis and **Value** on the Y-axis. - **Hindsight** is displayed along the bottom (low value) of the graph, representing **Descriptive Data Analytics**. - **Insight** is displayed in the middle of the graph, representing **Predictive Data Analytics**. - **Foresight** is displayed at the top (high value) of the graph, representing **Prescriptive Data Analytics**. ## Why Big Data Analytics? - A diagram shows three overlapping circles. - **Big Data Analytics** is in the center. - **Cost reduction**, **Faster, better decision making**, and **New products & services** are displayed in the other circles, respectively. - Benefits of using big data analytics: - Helps organizations harness data and use it to identify new opportunities. - Leads to smarter business moves. - Results in more efficient operations. - Creates higher profits and happier customers. - A diagram shows a circle with **Data Analytics** in the center. - Arrows point from the circle with various phrases. - **Predict** - **Describe** - **Optimise** - **Trust** - **Discover** - **Empower** - **Embed** - **Delivering Change through Analytics** ## Data Analytics tools - Logos for the following data analytics tools are presented. - **kissmetrics** - **Tableau** - **Keen IO** - **HEAP** - **Google Analytics** - **crazyegg** ## Data Analyst/Journalist - A picture of magnifying glass on a bar graph is presented. - **Data Analyst/Journalist** role is described thus: - Research and extract information from various sources. - Describe the main features of a data set in support of business operations and decision making. ## Data Scientist - A beaker containing graphs and data is presented in an image. - **Data Scientist** role is described thus: - Cleanse data. - Apply various techniques to large data sets and create visualizations to understand abstract business questions. - Provide predictive insights into the future. - Anticipate future behaviors ## Data Scientist: The Sexiest Job in the 21st Century - A large text caption reads **Data Scientist: The Sexiest Job in the 21st Century**. - A business analyst isn't able to discover insights from huge sets of data. - Data scientists can work in coordination with different verticals within an organization. - Data Scientists find useful patterns and insights for a company to make tangible business decisions. - **Between 2011-2012, there was a 15,000% increase in job postings for data scientists in the US.** ## Use Cases - The logos from the first page are presented again. - Each logo is accompanied by a phrase describing how these platforms use data analytics. - **Facebook:** Social Analytics - **Amazon:** Improving E-commerce experience - **Uber:** Optimizing Rides - **Bank of America:** Increasing Customer experience - **Airbnb:** Improving Searches - **Spotify:** Music Recommendation ## Facebook - **2.5 Billion** monthly active users - **Deep learning** is utilized, including facial recognition and text analysis. - **Facebook uses powerful neural networks to classify faces in photographs.** - **Deep Text** engine is used to understand user sentences. - **Deep learning** is used for targeted advertising. - Facebook uses insights gained from data to cluster users based on their preferences and provide advertisements that appeal to them. - A picture of a social network graph is presented, with nodes marked as **User V**, **User W**, **User X**, **User Y**, and **User Z**. ## Amazon - Amazon heavily relies on **predictive analytics** to increase customer satisfaction. - Amazon uses a **personalized recommendation system**. - Amazon draws suggestions from other users who used similar products or provided similar ratings. - A diagram shows a process flow for **Amazon's Recommendation Engine**. - **Amazon S3:** Storing user data and inventory information. - **Amazon Personalize API:** Using personalize API to create a flow of user activity. - **Amazon Personalize:** Processing and examining the data, identifying the meaningful, selecting correct algorithms, and training and optimizing a personalization model for customizing the data. - **Loading Data:** - **Data Inspection:** - **Feature Identification:** - **Selecting the Right Algorithm:** - **Selecting the hyper parameters:** - **Trainingthe models:** - **Optimizingthe models:** - **Building an assembly of features:** - **Model Hosting:** - **Creation of the real-time caches:** -- **Customized Personalization API:** Using the user activity stream to produce recommendations. ## Amazon - **Fraud Detection** - Amazon has its own novel ways and algorithms to detect fraud sellers and fraudulent purchases. ## Uber - Uber is a popular smartphone application that allows you to book a cab. - Uber makes extensive use of **Big Data**. - Uber maintains a large database of drivers, customers, and several other records. - Whenever you hail for a cab, Uber matches your profile with the most suitable driver. - A list outlines the fees charged by Uber: - **Base (or initial) fare:** A flat fee charged at the beginning of a ride. - **Cost per minute:** How much is charged for each minute in the ride. - **Cost per mile:** How much is charged for each mile of the ride. - **Booking Fee (Formerly 'Safe Rides Fee'):** A flat fee to cover Uber's operating costs (Not included for Uber's more luxury services like UberBlack or UberSUV) - An equation is presented to calculate the Uber fare: $Base Fare + (Cost per minute * time in ride) + (Cost per mile * ride distance) + Booking Fee = Your Fare$. - The Uber logo is presented. ## Bank of America - Using Data to Leverage Customer Experience - Bank of America makes use of **Data Science** and **predictive analytics**. - Banking industries are able to detect frauds in payments and customer information. - Bank of America prevents frauds regarding insurances, credit cards, and accounting. - Banks employ **data scientists** to use their quantitative knowledge where they apply algorithms like association, clustering, forecasting, and classification. - **Risk modeling** using **Machine Learning**. - Bank of America seeks to minimize risk modeling. - The Bank of America logo is presented. ## Bank of America - Using Data to Leverage Customer Experience - Bank of America utilizes **intelligent customer segmentation** to understand their customers. - This segmentation categorizes customers into **high-value** and **low-value** segments. - **Clustering**, **logistic regression**, and **decision trees** are used by banks to understand the **Customer Lifetime Value (CLV)** and take group them in the appropriate segments. - The Bank of America logo is presented. ## Airbnb - Users can host accommodations and find accommodations through the Airbnb app and website. - Airbnb contains massive **big data** of customer and host information, stay records, and website traffic. - Users from certain countries would click the neighborhood link, browse the page and photos and not make any booking. - A different version of the app was released to mitigate this issue. - This version replaced neighborhood links with top travel destinations. - This resulted in a 10% improvement in the lift rate for users from those countries. - The Airbnb logo is presented. ## Spotify - Spotify is an **online music streaming giant**. - Spotify deals with a massive amount of **big data** from over 100 million users. - Spotify uses the 600 GBs of daily data generated by users to build algorithms to **boost user experience**. - In 2017, Spotify used **data science** to gain insights about which universities had the highest percentage of party playlists and which ones spent the most time on it. - The Spotify logo is presented.