Descriptive Analysis and Population Estimation PDF
Document Details
Uploaded by DiplomaticJadeite1956
Bentley University
Tags
Summary
This document provides an overview of descriptive analysis, population estimation, and hypothesis testing. It includes topics like central tendency and variability.
Full Transcript
Chapter 12 Descriptive Analysis, Population Estimation and Hypothesis Testing Relationships Association (Regression) Analysis Analysis Difference...
Chapter 12 Descriptive Analysis, Population Estimation and Hypothesis Testing Relationships Association (Regression) Analysis Analysis Difference Analysis Inference Analysis Descriptive Analysis Descriptive analysis summarizes basic findings for the sample, usually describing the “typical” respondent and how similar respondents are to the typical respondent. Central tendency: a “typical” customer, product, brand evaluation, etc. Mode – the value that occurs most often Descriptive Median – the value lies in the middle of ordered values Mean – the arithmetic average of a set of numbers Analysis Variability: diversity of respondents, e.g., differences among customers, products, brands, etc. Frequency and percentage distribution Range – the distance between the lowest and the highest value Standard deviation/variance – the degree of variation or diversity in the values When to Use Each Descriptive Analysis Measure AutoConcepts.sav: Sample Descriptive Statistics Summarizing Categorical Variable Example – Dwelling Type What’s the distribution of dwelling type in the sample? SPSS: Analyze > Descriptive Statistics > Frequencies Click “Charts” Select Charts Type Exercise Summarize Demographic Information from the Sample Gender Gender Marital status Age Income Notes: Report Median instead of Mode for Ordinal Data Summarizing Continuous Variable Example – Consumers’ Attitudes Toward Global Warming SPSS: Analyze > Descriptive Statistics > Descriptives SPSS: Analyze > Descriptive Statistics > Descriptives 16 Which of the following statements resonates the strongest with respondents in the sample? Which statement shows the most consensus (i.e., the least variability in responses)? “I am worried about global warming.” “Gasoline emissions contribute to global warming.” “Global warming is a real threat.” Graphs→ Chart Builder Drag scale variables into Y-Axis The default statistics is “Mean” Exercise How do consumers’ concerns about global warming reflect in their driving behavior? “I drive conservatively to use less fuel.” “I check traffic reports to avoid idling in traffic.” “I drive an automobile that is fuel efficient.” Reporting Categorical (Nominal or Ordinal) data Central tendency: Mode/Median Variation: Frequency, percentage distribution, cumulative distribution Organization Guidelines: Highlight key measures Use highly descriptive and self-explanatory labels. If appropriate, arrange the variables (rows) in logical order, usually ascending or descending, based on the descriptive measure being used. Reporting Scale (Interval or Ratio) data Central tendency: Mean Variation: Standard Deviation and Range (Maximum and Minimum Values if the data have several different values) Organization Guidelines: With scales, include a table footnote that describes the scale. Typically, use variables with identical response scales in a single table. If appropriate, arrange the variables (rows) in a logical order, usually ascending or descending, based on the descriptive measure being used Only report findings that are meaningful or useful. Use a conservative, professional format. Parameter estimation is the process of using sample information to compute an interval that describes the range of a parameter such as the population mean or the population percentage. A confidence interval (CI) is a statistical tool used to estimate the range within which a population parameter (like a mean or proportion) is likely to fall based on sample data (which unavoidably encounters sampling error). Example: If a marketing survey estimates that 60% of customers advocate Brand A Customers evaluate the brand 4.2 out of 5 What will be the brand advocation rate and brand evaluation score in the population (the true values)? Key Factors That Matter When Calculating a Confidence Interval The sample statistic (in most cases, we are interested in the mean for a continuous score or the proportion for a categorical score) The standard error of the statistic Sample size Variability The desired level of confidence (the percentage of probability, or certainty, that the confidence interval would contain the true population parameter when you draw a random sample many times.) 90% critical value z=±1.64 95% critical value z=±1.96 (*most commonly used) 99% critical value z=±2.58 PROCESS TO CALCULATE CONFIDENCE INTERVAL CONTINUOUS Collect sample information: sample size n and CATEGORICAL Percentage distribution for Standard deviation for DATA DATA categorical data continuous data Calculating Standard Error Calculating Confidence Interval (with z value) Example: Calculating Confidence Interval for a Value Breakfast Combo Scenario: McDonald’s is evaluating whether offering a value breakfast combo of coffee and an Egg McMuffin is a good idea. From survey data, they found that out of 100 customers, 10 customers (10%) said they would be interested in the value breakfast combo. The question is how to generalize this 10% proportion in the sample to forecast the profitability of this promotion across a larger population. n=100 If 10% customers buy coffee and McMuffin together in the sample, at 95% level, how many customers will buy coffee and McMuffin together? CI = 10% ±3% X 1.96= [4.12%, 15.88%] How to Use Calculated CI for Forecasting Profitability 1. Simulate Different Uptake Scenarios: 1. Worst case: 4.1% adoption rate 2. Expected case: 10% adoption rate 3. Best case: 15.9% adoption rate 2. Assess Marketing Strategy: 1. If the 4.1% scenario is profitable, the combo may be low-risk. 2. If the profitability only works with 10% or higher uptake, McDonald’s may need to refine the promotion (e.g., offering discounts or targeted marketing) to achieve that target. 3. Consider Collecting More Data: 1. If this sample is small, additional data would help narrow the confidence interval and provide a more precise estimate for decision-making. Example 2 – Calculating Confidence Interval of Reading Time For New York Times The New York Times conducted a survey on daily reading time among 100 readers. The survey recorded the following summary statistics: Average reading time: 25 minutes Standard deviation: 20 minutes Sample size: 100 readers Goal: Construct a 95% confidence interval for the population's average reading time. Example 2 – Calculating Confidence Interval of Reading Time For New York Times CI = 25 ±2 X 1.96= [21.08, 28.92] Original Scenario The New York Times conducted a survey on daily reading time among 100 readers. The survey recorded the following summary statistics: Average reading time: 25 minutes Standard deviation: 20 minutes Sample size: 100 readers Whether each of the following scenarios will cause the population CL to become narrower or wider? Group 1 – standard deviation extends to 30 mins Group 2 – standard deviation reduces to 15 mins Group 3 – sample size increases to 1000 readers Group 4 – instead of 95% confidence level, use 99% CL Group 5 – instead of 95% confidence level, use 90% CL 90% critical value z=±1.64 95% critical value z=±1.96 (*most commonly used) 99% critical value z=±2.58 AutoConcepts.sav: Confidence Interval (Population Estimation) To assess the WOM potential on social network sites, estimate the proportion of the target population that spends 3 hours or more daily on social network sites. How many hours per day do you spend on social network sites? o Less than 1 hour o About 1 hour o About 2 hours o About 3 hours or more* To assess the WOM potential on social network sites, estimate the proportion of the target population that spends 3 hours or more daily on social network sites. The sample reflects that 9.6% of participants spend 3 hours or more daily on social network sites. You can use this sample information to (hand) calculate the confidence interval of population proportion or do the following: Procedure: Analyze → Compare Means and Proportions → One-Sample Proportions Procedure: Select the target variable into “Test Variable(s)” Under “Define Success”, specify the value coding that corresponds to the group of interest. In this case, value coding "3" denotes individuals spending about three hours or more on social media. Set this value as the success criterion. Procedure: Click “Confidence Intervals…” Specify Coverage Level (95% by default) Under “Select Type(s)”, change to Wald. A Comparison of different methods You DO NOT need to know the technical part of this. For simplicity (consistent with the textbook equation), we'll use the Wald method to estimate population proportions. To assess the WOM potential on social network sites, estimate the proportion of the target population that spends 3 hours or more daily on social network sites. The proportion of target population that spends 3 hours or more daily on social network sites ranges from 7.77% ~ 11.43% Evaluate the Population's Perspectives on Consumers’ Attitudes Towards Global Warming “I am worried about global warming.” “Gasoline emissions contribute to global warming.” “Global warming is a real threat.” Getting Confidence Interval for the Set of Statements: Click the variables you are interested to get confidence interval. “Options” Let you set different levels of confidence. The default is 95% To calculate the confidence interval, you are not testing it against any hypothesized value, set Test Value as 0 Sample statistics: size, mean, standard deviation, and standard error. ▪ Confidence interval at 95% ▪ If recollect sample 100 times, 95 times the sample mean will fall in these ranges. Report sample size in the footnote.