Unit II - Descriptive Analytics PDF
Document Details
Uploaded by Deleted User
LPU
Ranjit Kaur Walia
Tags
Summary
This document provides an overview of descriptive analytics, including its methods, examples, and applications. It discusses how descriptive analytics can be used to analyze data, identify trends, and improve business performance. The document explains the steps involved in descriptive analytics, from quantifying goals to data presentation.
Full Transcript
Unit II Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Descriptive Analytics Descriptive analytics is the simplest type of analytics and the foundation the other types are built on. It allows you to pull trends from raw data and succinctly describe what...
Unit II Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Descriptive Analytics Descriptive analytics is the simplest type of analytics and the foundation the other types are built on. It allows you to pull trends from raw data and succinctly describe what happened or is currently happening. Descriptive analytics answers the question, “What happened?” For example, imagine you’re analyzing your company’s data and find there’s a seasonal surge in sales for one of your products: a video game console. Here, descriptive analytics can tell you, “This video game console experiences an increase in sales in October, November, and early December each year.” Data visualization is a natural fit for communicating descriptive analysis because charts, graphs, and maps can show trends in data—as well as dips and spikes—in a clear, easily understandable way. Ranjit Kaur Walia, Asst Prof., SCA, LPU How does descriptive analytics work? Descriptive analytics uses various statistical analysis techniques to slice and dice raw data into a form that allows people to see patterns, identify anomalies, improve planning and compare things. Enterprises realize the most value from descriptive analytics when using it to compare items over time or against each other. For example, a finance manager might compare product sales month over month or against related categories. Descriptive analytics can work with numerical data, qualitative data or some combination. Numerical data might quantify things like revenue, profit or a physical change. Qualitative data might characterize elements such as gender, ethnicity, profession or political party. To improve understanding, raw numerical data is often binned into ranges or categories such as age ranges, income brackets or zip codes. Ranjit Kaur Walia, Asst Prof., SCA, LPU Descriptive analysis techniques perform various mathematical calculations that make recognizing or communicating a pattern of interest easier. For example, "central tendency" describes what is normal for a given data set by considering characteristics such as the average, mean and median. Other elements include frequency, variation, ranking, range and deviation. Ranjit Kaur Walia, Asst Prof., SCA, LPU Examples of Descriptive Analytics 1. Traffic and Engagement Reports 2. Financial Statement Analysis (There are several types of financial statements, including the balance sheet, income statement, cash flow statement, and statement of shareholders’ equity etc.) 3. Demand Trends 4. Aggregated Survey Results 5. Progress to Goals Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU How is descriptive analytics used? Descriptive analysis supports a broad range of users in interpreting data. Descriptive analytics are commonly used for the following: financial reports planning a new program measuring effectiveness of a new program understanding sales trends comparing companies motivating behavior with KPIs recognizing anomalous behavior interpreting survey results Ranjit Kaur Walia, Asst Prof., SCA, LPU What can descriptive analytics tell you? Businesses use descriptive analytics to assess, compare, spot anomalies and identify relative strengths and weaknesses. Ranjit Kaur Walia, Asst Prof., SCA, LPU Steps in descriptive analytics 1.Quantify goals. The process starts by translating some broad business goals, such as better business performance, into specific, measurable outcomes such as sales-by-product, cost-per-sale or conversion rate. 2.Identify relevant data. Teams need to identify any types of data that may help improve the understanding of the critical metric. The data might be buried across one or more internal systems or various third-party data sources. 3.Organize data. Data from different sources, applications or teams needs to be cleaned and normalized to improve analytics accuracy. 4.Analysis. Various statistical and mathematical techniques combine, summarize and compare the raw data in different ways to generate data features. 5.Presentation. Data features may be numerically presented in a report, dashboard or visualization. Common visualization techniques include bar charts, pie charts, line charts, bubble charts and histograms. Ranjit Kaur Walia, Asst Prof., SCA, LPU Benefits and drawbacks of descriptive analytics The use of descriptive analytics can provide the following benefits: It can simplify communication about numerical data. It can improve understanding of complex situations. Companies can compare performance against the competition or across product lines. It can be used to help motivate teams to reach new goals. Ranjit Kaur Walia, Asst Prof., SCA, LPU Top drawbacks and weaknesses of descriptive analytics include the following: Existing biases can be amplified either accidentally or deliberately. Results can direct a company's focus to metrics that are not helpful, like sales versus profits. Motivational metrics can be gamed to encourage unintended behavior, such as mouse movers or sales fraud. Poorly chosen metrics can lead to a false sense of security. Ranjit Kaur Walia, Asst Prof., SCA, LPU Descriptive analytics tools Relatively simple tools like an Excel spreadsheet and some knowledge of business management are enough to craft basic descriptive analytics. Business intelligence tools like Power BI, Tableau and Qlik can simplify many steps of the descriptive analytics process. Descriptive analytics tools provide various ways for reorganizing raw data to see new patterns by calculating characteristics such as averages, frequencies, variations, rankings, ranges and deviations. While these basic techniques are baked into essential BI tools, a team may turn to more sophisticated data science tools for complex statistics, including the following: the programming language R; IBM's Statistical Package for the Social Sciences; SAS Institute Inc.'s analytics software; and open source software from Knime. Data wrangling tools can help automate data engineering processes to cleanse, reformat and combine data from many different sources. Popular tools include offerings from Alteryx, CambridgeRanjit Semantics, Kaur Walia, Asst Prof.,Trifacta, SCA, LPU Talend and Tamr. Exploring Data Data exploration is the first step of data analysis used to explore and visualize data to uncover insights from the start or identify areas or patterns to dig into more. Using interactive dashboards and point- and-click data exploration, users can better understand the bigger picture and get to insights faster. Ranjit Kaur Walia, Asst Prof., SCA, LPU Why is Data Exploration Important? Starting with data exploration helps users to make better decisions on where to dig deeper into the data and to take a broad understanding of the business when asking more detailed questions later. With a user-friendly interface, anyone across an organization can familiarize themselves with the data, discover patterns, and generate thoughtful questions that may spur on deeper, valuable analysis. Ranjit Kaur Walia, Asst Prof., SCA, LPU Data exploration vs. data mining In data science, there are two primary methods for extracting data from disparate sources: data exploration and data mining. Data exploration is a broad process that is performed by business users and an increasing numbers of data scientists with no formal training in data science or analytics, but whose jobs depend on understanding data trends and patterns. Visualization tools help this wide-ranging group to better export and examine a variety of metrics and data sets. Data mining is a specific process, usually undertaken by data professionals. Data analysts create association rules and parameters to sort through extremely large data sets and identify patterns and future trends. Typically, data exploration is performed first to assess the relationships between variables. Then the data mining begins. Through this process, data models are created to gather additional insight from the data. Ranjit Kaur Walia, Asst Prof., SCA, LPU Probability Distribution for Descriptive Analytics Ranjit Kaur Walia, Asst Prof., SCA, LPU Probability distribution is a fundamental concept in descriptive analysis, used to describe how the values of a random variable are distributed. It provides a way to understand the likelihood of different outcomes and can be visualized using graphs such as histograms or probability density functions. Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU If the support of the random variable is a finite or countably infinite number of values, then the random variable is discrete. Discrete random variables have a probability mass function (pmf). This pmf gives the probability that a random variable will take on each value in its support. The cumulative distribution function (cdf) provides the probability the random variable is less than or equal to a particular value. Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Binomial distribution in R is a probability distribution used in statistics. The binomial distribution is a discrete distribution and has only two outcomes i.e. success or failure. All its trials are independent, the probability of success remains the same and the previous outcome does not affect the next outcome. The outcomes from different trials are independent. Binomial distribution helps us to find the individual probabilities as well as cumulative probabilities over a certain range. It is also used in many real-life scenarios such as in determining whether a particular lottery ticket has won or not, whether a drug is able to cure a person or not, it can be used to determine the number of heads or tails in a finite number of tosses, for analyzing the outcome of a die, etc. Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU Binomial Distribution Binomial distribution is a discrete distribution. Let A be an event associated with a random experiment. If event A happens, we call it as success, otherwise it is failure. Probability of success 𝑃 𝐴 = 𝑝 Probability of failure 𝑃 𝐴ҧ = 1 − 𝑝 = 𝑞 Ranjit Kaur Walia, Asst Prof., SCA, LPU Assumptions of Binomial Distribution 1. Each trial results in two disjoint outcomes, a success or a failure. 2. The number of trials made ‘n’ is finite. 3. The trials are independent. 4. Probability of success = 𝑝 is constant for each trial. Examples are: Throwing a dice, Tossing of coins, Drawing a card from a pack of well shuffled 52 cards. Ranjit Kaur Walia, Asst Prof., SCA, LPU Quiz Which of the following is not true for Binomial distribution? (A) Each trial results in two disjoint outcomes, a success and a failure. (B) The number of trials made n is finite. (C) The trials are dependent. (D) Probability of success is constant for each trial. Ranjit Kaur Walia, Asst Prof., SCA, LPU Understanding Binomial Probability Binomial probability is the probability of getting a specific number of successes in a fixed number of independent trials, where each trial has two possible outcomes: success or failure. The trials are identical, and the probability of success remains the same in each trial. Ranjit Kaur Walia, Asst Prof., SCA, LPU The binomial probability formula is: Ranjit Kaur Walia, Asst Prof., SCA, LPU Example Scenario Suppose a coin is flipped 10 times, and you want to know the probability of getting exactly 6 heads. In this case: n=10 (number of trials) k=6 (number of successes, i.e., heads) p=0.5 (probability of getting heads in each flip) Ranjit Kaur Walia, Asst Prof., SCA, LPU Ranjit Kaur Walia, Asst Prof., SCA, LPU dbinom(x, size, prob): This function returns the probability of getting exactly x successes in size trials, with the success probability prob. Example: Calculate the probability of getting exactly 6 heads in 10 flips. dbinom(6, size = 10, prob = 0.5) Output: 0.2050781 (approximately 20.5% chance) Ranjit Kaur Walia, Asst Prof., SCA, LPU pbinom(q, size, prob): This function returns the cumulative probability of getting q or fewer successes in size trials with success probability prob. Example: Calculate the probability of getting 6 or fewer heads in 10 flips. pbinom(6, size = 10, prob = 0.5) Output: 0.828125 (approximately 82.8% chance) Ranjit Kaur Walia, Asst Prof., SCA, LPU qbinom(p, size, prob): This function returns the smallest number of successes q for which the cumulative probability is at least p. Example: Find the number of heads (successes) such that the cumulative probability is at least 90% in 10 flips. qbinom(0.9, size = 10, prob = 0.5) Output: 7 (7 heads are needed to reach a cumulative probability of at least 90%) Ranjit Kaur Walia, Asst Prof., SCA, LPU rbinom(n, size, prob): This function generates n random values from a binomial distribution with size trials and success probability prob. Example: Simulate flipping a coin 10 times, 5 experiments, and observe the number of heads each time. rbinom(5, size = 10, prob = 0.5) Output: A vector like 4 6 5 7 4 (results may vary each time you run this) Ranjit Kaur Walia, Asst Prof., SCA, LPU Differences Between the Functions: dbinom: Used when you need the probability of a specific number of successes. pbinom: Used when you want to know the probability of getting up to a certain number of successes (cumulative probability). qbinom: Used when you know the cumulative probability and want to find the corresponding number of successes. rbinom: Used to generate random samples from a binomial distribution, which is useful for simulations and experiments. Ranjit Kaur Walia, Asst Prof., SCA, LPU Real-Life Data Analysis Example Using Binomial Distribution Scenario: Email Spam Detection Imagine you work for an email service provider, and your job is to analyze the effectiveness of a new spam filter. The filter has been applied to a large batch of emails, and you are interested in understanding the probability of the filter correctly identifying spam emails. Let's say you have the following data: - You randomly select 100 emails from the filtered batch. You know that, on average, 20% of all incoming emails are spam (p = 0.2). Out of the 100 emails sampled, you find that 30 emails were identified as spam by the filter. Ranjit Kaur Walia, Asst Prof., SCA, LPU Analysis Goals: 1. Assess the likelihood of observing exactly 30 spam emails out of 100. 2. Determine the cumulative probability of observing up to 30 spam emails. 3. Simulate potential outcomes to understand the distribution of spam detection. Ranjit Kaur Walia, Asst Prof., SCA, LPU 1. Probability of Exactly 30 Spam Emails (dbinom) You can use the binomial distribution to calculate the probability of observing exactly 30 spam emails out of 100. # Probability of exactly 30 spam emails prob_exact_30