Chapter 5 (Part II) Statistical Inference PDF

CHAPTER 5 (PART II) Statistical Inference Prepared by: Nur Liyana Mohamed Yousop ANOVA ANALYSIS OF VARIANCE (ANOVA) One Way ANOVA Used to compare the means of two, three or more population groups. ANOVA derives its name from the fact that we are analyzing variances in the data (analysis of variances). ANOVA measures variation between groups relative to variation within groups. Each of the population groups is assumed to come from a normally distributed population. ASSUMPTIONS OF ONE WAY ANOVA The m groups or factor levels being studied represent populations whose outcome measures: 1. are randomly and independently obtained 2. are normally distributed 3. have equal variances. If these assumptions are violated, then the level of significance and the power of the test can be affected. T-TEST VS ANOVA When we have only two samples, t-test and ANOVA give the same results. However, using ttest would not be reliable in cases where there are more than 2 samples If we conduct multiple samples, it will have a compounded effect on the error rate of result. ANALYSIS OF VARIANCE (ANOVA) One Way ANOVA F = VARIANCE BETWEEN SAMPLES VARIANCE WITHIN SAMPLES A one way ANOVA is used to compare two or more means from two or more independent (unrelated) groups using the Fdistribution. The null hypothesis for the test is that the two or more means are equal. Excel Analysis ToolPak to test one way ANOVA is:  Anova: Single Factor HOW ANOVA WORKS One Way ANOVA Why ANOVA not ANOME? While comparing means its analyses its variances Basically, ANOVA compares two or more types of variances: the variance within each sample and the variance between different samples. The black dotted arrows show the per-sample variation of the individual data points around the sample mean (the variance within). The red arrows show the variation of the sample means around the grand mean (the variance between). Then calculate the F-Test ** Grand mean = Mean of overall samples EXAMPLE 1 Difference in Insurance Survey Data Determine whether any significant differences exist in satisfaction among individuals with different levels of education. The variable of interest is called a factor. In this example, the factor is the educational level, and we have three categorical levels of this factor, college graduate, graduate degree, and some college. SOLUTION 1 Applying the Excel ANOVA Tool Data Analysis tool: ANOVA: Single Factor  The input range of the data must be in contiguous columns 0.025 SOLUTION 1 Applying the Excel ANOVA Tool CHI-SQUARE NON-PARAMETRIC METHOD INTRODUCTION σ2 is unknown/known Nominal @ Categorical scale data (E.g. Gender, State of Birth, Brand) Case III n<30 Population is not normal NON-PARAMETRIC METHOD CHI-SQUARE TEST FOR INDEPENDENCE • H0: two categorical variables are independent • H1: two categorical variables are dependent Test for independence of two categorical variables. EXAMPLE 2 Independence and Marketing Strategy Energy Drink Survey data. A key marketing question is whether the proportion of males who prefer a particular brand is no different from the proportion of females.  If gender and brand preference are indeed independent, we would expect that about the same proportion of the sample of female students would also prefer brand 1.  If they are not independent, then advertising should be targeted differently to males and females, whereas if they are independent, it would not matter. CHI-SQUARE TEST CALCULATIONS  Step 1 ◦ Using a cross-tabulation of the data, compute the expected frequency if the two variables are independent. CHI-SQUARE TEST CALCULATIONS  Step 2 ◦ Compute a test statistic, called a chi-square statistic, which is the sum of the squares of the differences between observed frequency, fo, and expected frequency, fe, divided by the expected frequency in each cell: CHI-SQUARE DISTRIBUTION The sampling distribution of C2 is a special distribution called the chisquare distribution.  The chi-square distribution is characterized by degrees of freedom. CHI-SQUARE TEST CALCULATIONS  Step 3 ◦ Compare the chi-square statistic for the level of significance a to the critical value from a chi-square distribution with (r – 1)(c – 1) degrees of freedom, where r and c are the number of rows and columns in the cross-tabulation table, respectively. The Excel function CHISQ.INV.RT(probability, deg_ freedom) returns the value of C2 that has a right-tail area equal to probability for a specified degree of freedom. By setting probability equal to the level of significance, we can obtain the critical value for the hypothesis test. The Excel function CHISQ.TEST(actual_range, expected_range) computes the p-value for the chi-square test. EXAMPLE 2 Conducting the Chi-square Test Result Test statistic = 6.49 d.f. = (2 – 1)(3 – 1) = 2 Critical value = CHISQ.INV.RT(0.05,2) = 5.99 p-value = CHISQ.TEST(F6:H7,F12:H13) = 0.0389 Reject H0 Test statistic END OF CHAPTER 5

Chapter 5 (Part II) Statistical Inference PDF

Document Details

Tags

Related

Summary

Full Transcript