Topic3_Descriptive_Stat_Part1_2023_2.pptx

Full Transcript

Topic 3: Insights Into Descriptive Statistics GEN 4191 Data Analytics for Business Optimisation 2023.2 – BBA6 Dr. Krisztina Soreg Let’s get started! After this session you should be able to: • Understand the definition and importance of Descriptive Statistics and its application; • Distinguish...

Topic 3: Insights Into Descriptive Statistics GEN 4191 Data Analytics for Business Optimisation 2023.2 – BBA6 Dr. Krisztina Soreg Let’s get started! After this session you should be able to: • Understand the definition and importance of Descriptive Statistics and its application; • Distinguish the three main types of Descriptive Statistics and the most frequently used tools such as frequency, mean, median, mode, range, standard deviation and variance; • Create an excel report based on the provided data, developing summaries of the results and applying visual tools for presenting the main findings. What is Descriptive Statistics? • Main goal: to understand past and current business performance and make in informed decisions • The most used and most well-understood type of analytics. • Method: to categorize, characterize, consolidate and classify data to convert it into useful information • Tools: to summarize data into meaningful charts and reports, (e.g.: budgets, sales, revenues or costs) • Limits: conclusions might not be made based on the available data, no patterns/relationships can be revealed What is Descriptive Statistics? Population & sample Whenever there is a large population, the probability of making an error increases. This needs to be dealt with. In addition, researchers face challenges like data distortion, recalculation and missing figures. This is where descriptive statistics come into play: a small data sample is taken and summarized. What is Descriptive Statistics? Sampling error A statistical error that occurs when an analyst does not select a sample that represents the entire population of data  deviation  wrong results Outcome: the results found in the sample do not represent the results that would be obtained from the entire population Types 1. Population-specific error 2. Selection error 3. Sample frame error 4. Non-response error What is Descriptive Statistics? 1. Population-specific error A population-specific error occurs when a researcher doesn't understand who to survey. Examples Situation: a survey about health issues among the elderly. Who should be surveyed? The elderly people with health issues, their caregivers, or their physicians? Sampling error What is Descriptive Statistics? 2. Selection error When the survey is self-selected, or when only those participants who are interested in the survey respond to the questions  volunteers opting into a study. Example 1 A survey that only relies on a small portion of people who immediately respond. Example 2 If a researcher puts out a call for responses on social media, they’re going to get responses from people they know, and of those people, only the more helpful or affable individuals will reply. Sampling error What is Descriptive Statistics? 3. Sample frame error When a sample is selected from the wrong population data. Example In the 1936 US presidential election between Roosevelt – the Democratic candidate – and Landon of the Republican party. The sample frame was from car registrations and telephone directories. In 1936, many Americans did not own cars or telephones, and those who did were largely Republicans. The results wrongly predicted a Republican victory. Sampling error What is Descriptive Statistics? 4. Non-response error When a useful response is not obtained from the surveys because researchers were unable to contact potential respondents (or potential respondents refused to respond). Examples • Asking for embarrassing information, or information about illegal activities. • Email invites might have disappeared into the Spam folder • People who are more active runners might be more inclined to answer a survey about running than people who aren’t as active in the community. Sampling error What is Descriptive Statistics? Sampling error Populationspecific error A market analysis about healthy lifestyle involving only active gym members Asking customers about their income during a product testing at a store In a survey of breakfast cereals, the population is only the mother within a household A survey about the new Louis Vuitton earbuds is only carried out among LV customers A national survey about email providers involves only users with a Gmail account Selection error Sample frame error Nonresponse error X X X X X Examples of Descriptive Statistics Sum of the sales made each month Median sales order per customer Standard deviation of the age of the customers Percentage of customers who default on their loan Examples of Descriptive Statistics Youngest: Malala Yousafzai (17): Nobel Peace Prize John B. Goodenough (97): Nobel Prize in Chemistry Task: analyze the following histogram by the distribution of age of Nobel Prize winners Types of Descriptive Statistics Descriptive Statistics Frequency Distribution Pattern of frequencies of a variable Central Tendency tools Mean, median & mode  single value reflecting the center of the data distribution Variability Range, standard deviation and variance Frequency Distribution Absolute frequency • A frequency or count of the different outcomes in a data set or sample IQ level Number of employees (count) • The number of times a specific data value occurs in your dataset 118 - 125 4 126 - 133 6 • If 4 people have an IQ of between 118 and 125, then an IQ of 118 to 125 has a frequency of 4 134 - 142 4 143 - 149 2 150 - 157 1 Total 17 • Frequency chart: arranging data values in ascending order of magnitude along with their frequencies Frequency Distribution Relative frequency • The number of times an event occurs divided by the total number of events occurring in a given scenario • Relative Frequency = Subgroup frequency / Total frequency • Relative Frequency = f / n Where: • f is the number of times the data occurred in an observation • n = total frequencies IQ level Number of employees (count) Rel. freq. 118 - 125 4 0.235 126 - 133 6 0.353 134 - 142 4 0.235 143 - 149 2 0.117 150 - 157 1 0.058 Total 17 1 Frequency Distribution Cumulative frequency • Shows a running total of all preceding frequencies in a frequency distribution • E.g.: 4 + 6 = 10 • 10 + 4 = 14 • What does it show: how often a value and its previous values appear  Evaluating sales performance over several months  Summarizing survey results  Determining how many companies have outstanding shares IQ level Number of employees (count) Cum. freq. 118 - 125 4 4 126 - 133 6 10 134 - 142 4 14 143 - 149 2 16 150 - 157 1 17 Frequency Distribution How to calculate frequency in Excel? 1) With function • Goal: to automatically calculate the frequency of the occurrences with an Excel function In Excel • Step 1: create a new column entitled “Bins” (groups based on which you will determine the number of matching data) • Step 2: go to the Formulas tab  click on Insert Function • Step 3: choose the “Frequency” function and select the Data and Bins array • Step 4: press on Ctrl + Shift + Enter to carry out the calculation (Mac: Command + Shift + Enter) • Limit: does not create any chart! Frequency Distribution: Examples How to insert a histogram? – With Data Analysis ToolPak • Step 1: select the available data • Step 2: Insert  Insert statistic chart  Histogram • Step 3: adjust the horizontal axis  right click  Format axis  select By Category • Step 4: name the Y axis, add a general title and extend the histogram to see each category You can add data labels for each column (click into the bars  Chart Design  Add Chart Element  Data Labels Frequency Distribution: Examples How to format bins in Mac? • • • • Step 1: click into the bar (column) itself Step 2: open Format Data Series Step 3: select the last symbol (Options) Step 4: open the pop-up menu of Bins and choose the by Category Frequency Distribution: Examples Frequency Distribution: Examples Frequency Distribution: Examples The frequency is represented on a map instead of a histogram  it is depicting the percentage of 25-64year-old people with a university degree in Europe (2021) Frequency Distribution: Examples What is the difference between a bar chart and a histogram? A histogram is only used to plot the frequency occurrences in a data set that has been divided into classes. It is used to summarize continuous (non-discrete) data  elements are grouped together, so that they are considered as ranges (bars touching each other). Bar charts are pictorial representations of data that uses columns to compare different categories of discrete data  elements are taken as individual entities. Bars must be always separated from each other! Remember: • In a vertical bar graph, frequency is measured by the height of the bar • In a histogram, frequency is measured by the area of the column Discrete data: finite value that can be counted • Tool: bar chart • E.g.: number of students in a class, number of buyers, gender of people Continuous data: infinite number of possible values that can be measured  unspecified number of possible measurements • Tool: histogram • E.g.: weight, height, speed Frequency Distribution: Examples What is the difference between a bar chart and a histogram? Frequency Distribution: Examples 2) How to create a histogram & frequency chart with the Data Analysis Toolkit? Step 1: Data tab  Data Analysis Step 2: select Histogram Step 3: choose your Input range (frequency) and Bin range (categories, groups, etc.) Step 4: select New worksheet if you want your results on a separate Excel sheet Step 5: select Chart output What is missing? Frequency Distribution: Examples Don’t leave your histogram as a bar chart! • Click the legend on the right side and press Delete. • Properly label your bins. (You can do it by overwriting your bins in the output frequency table) • To remove the space between the bars, right click a bar, click Format Data Series and change the Gap Width to 0%. • To add borders, right click a bar, click Format Data Series, click the Fill & Line icon, click Border and select a color. Frequency Distribution: Examples Remember: you have 3 methods to calculate frequency in Excel and to develop a histogram: Function Visual tool • Use the =frequency(X;Y) function for selecting your data and bins • Insert a histogram as a visual tool by selecting the raw data and bins and tailor it Data Analysis ToolPak • Run your Data Analysis ToolPak for the frequency calculation and insert the histogram Central Tendency Mean: average of a data set • Goal: to use a single value reflecting the center of the data distribution  central location • Types: mean, median and mode Median: middle of the set of numbers Mode: most common number Thank you for your attention! [email protected]

Use Quizgecko on...
Browser
Browser