Chapter 20 Learning Objectives (LOs) Nonparametric Tests - PDF

Document Details

DiplomaticJadeite1956

Uploaded by DiplomaticJadeite1956

Bentley University

2022

Jackson P. Lautier, PhD, FSA

Tags

nonparametric tests statistics mutual fund returns business statistics

Summary

This document covers nonparametric tests in business statistics, specifically focusing on chapter 20. It details learning objectives related to making inferences about population medians, conducting hypothesis tests, and examining correlations. The document also includes an introductory case on analyzing mutual fund returns.

Full Transcript

MA214 – Ch.20 Nonparametric Tests Jackson P. Lautier, PhD, FSA Bentley University Department of Mathematical Sciences Copyright © 2022 McGraw H...

MA214 – Ch.20 Nonparametric Tests Jackson P. Lautier, PhD, FSA Bentley University Department of Mathematical Sciences Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 12/17/2024 8-1 Chapter 20 Learning Objectives (LOs) LO 20.1 Make inferences about a population median. LO 20.2 Make inferences about the population median difference based on matched-pairs sampling. LO 20.3 Make inferences about the difference between two population medians based on independent sampling. LO 20.4 Make inferences about the difference between three or more population medians. LO 20.5 Conduct a hypothesis test for the population Spearman rank correlation coefficient. LO 20.6 Make inference about the difference between two populations of ordinal data based on matched-pairs sampling. LO 20.7 Determine whether the elements of a sequence appear in a random order. BUSINESSCopyright STATISTICS: COMMUNICATING © 2022 McGraw WITH NUMBERS, Hill. All rights reserved. 4eor| distribution No reproduction Jaggia, Kellywithout the prior written 20-2 consent of McGraw Hill. Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20-2 Introductory Case: Analyzing Mutual Fund Returns Dorothy works as a financial advisor at a large investment firm. The distribution of returns often diverges from the normal distribution, and the sample size is small. Dorothy explains that her analysis will use techniques that do not rely on stringent assumptions. 1. Determine whether the median return for each fund is greater than 5%. 2. Determine whether the median difference between the two funds’ returns differs from zero. 3. Determine whether the funds’ returns are correlated. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-3 consent of McGraw Hill. 20-3 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (1) The parametric tests presented in earlier chapters make assumptions about the underlying populations. These tests can be misleading if the assumptions are not met. Nonparametric tests use fewer and weaker assumptions. Nonparametric tests are particularly attractive when sample sizes are small, or sample data do not originate from a normal distribution. Nonparametric tests are also called distribution-free tests. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-4 consent of McGraw Hill. 20-4 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (2) Nonparametric tests have disadvantages. If the parametric assumptions are valid yet we choose to use a nonparametric test, the nonparametric test is less powerful (more prone to Type II error) than its parametric counterpart. The reason for less power is that a nonparametric test uses the data less efficiently. Nonparametric tests often focus on the rank of the observations rather than the magnitude of the observations. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-5 consent of McGraw Hill. 20-5 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (3) BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-6 consent of McGraw Hill. 20-6 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (4) The t-test assumes that we are sampling from a normal distribution or use a large sample size. Many variables have distributions which are not normal. Stock returns tend to have “fatter tailed” distributions, the likelihood of extreme returns is higher than for a normal distribution. In this case, a t-test may lead to erroneous conclusions since fatter tails imply a greater chance of seeing extreme values. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-7 consent of McGraw Hill. 20-7 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (5) The Wilcoxon signed-rank test makes no assumptions concerning the distribution of the population except that it is continuous and symmetric. The hypothesis test is about the population median 𝑚. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-8 consent of McGraw Hill. 20-8 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (6) The test statistic is defined as 𝑇 = 𝑇 +. 𝑇 + is the sum of the ranks of the positive differences from the hypothesized median. Calculations to obtain the test statistic. 1. Find the differences 𝑑𝑖 = 𝑥𝑖 − 𝑚0. 2. Take the absolute value of each difference. If the null hypothesis is true, then positive or negative differences of a given magnitude are equally likely. Discard any zero differences. 3. Rank the absolute value of each difference. 1 (smallest) to n (largest) n will be smaller if zero differences are discarded. Any ties in the ranks of differences are assigned the average of the tied ranks. 4. We then sum the ranks of the negative differences (𝑇 −) and sum the ranks of the positive differences (𝑇 +). BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-9 consent of McGraw Hill. 20-9 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (7) The sum of 𝑇 − and 𝑇 + should equal 𝑛(𝑛 + 1)Τ2. – If the null hypothesis were true, then 𝑇 − and 𝑇 + would equal about half of the total sum of ranks. – For testing, we could analyze either 𝑇 − and 𝑇 +. – We will base the test on 𝑇 +. There are two cases to consider. 1. 𝑛 ≤ 10 Use special tables to find the p-value Use statistical software such as R 2. 𝑛 > 10 The the sampling distribution of T can be approximated by the normal distribution. 𝑛 𝑛+1 𝑛 𝑛+1 2𝑛+1 𝜇𝑇 = and 𝜎𝑇 = 4 24 BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-10 consent of McGraw Hill. 20-10 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (8) Example: Determine if the median return for the Growth fund is more than 5%. 𝐻0 : 𝑚 ≤ 5; 𝐻𝐴 : 𝑚 > 5 BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-11 consent of McGraw Hill. 20-11 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (9) Example, continued. 𝑇 = 𝑇 + = 40 R reports a p-value of 0.1162. Do not reject the null hypothesis. At the 5% significance level, we cannot conclude that the median return for Growth is greater than 5%. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-12 consent of McGraw Hill. 20-12 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (10) Example, continued. 𝑇 = 𝑇 + = 40 Alternatively, we can use the normal approximation. 𝑛 𝑛+1 10 10+1 – 𝜇𝑇 = = = 27.50 4 4 𝑛 𝑛+1 2𝑛+1 10 10+1 2×10+1 𝜎𝑇 = = = 9.8107 24 24 𝑇−𝜇𝑇 40−27.50 The test statistic: 𝑧 = = = 1.27 𝜎𝑇 9.8107 p-value 𝑃 𝑍 ≥ 1.27 = 0.1020 The decision and conclusion are the same. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-13 consent of McGraw Hill. 20-13 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.1 Testing a Population Median (11) Example: Determine if the median return for the Value fund is more than 5%. 𝐻0 : 𝑚 ≤ 5; 𝐻𝐴 : 𝑚 > 5 BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-14 consent of McGraw Hill. 20-14 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (1) The t-tests about population means from matched- pairs or independent samples assumes we are sampling from normal populations. The Wilcoxon signed-rank test is the nonparametric counterpart to the matched pairs t- test. The Wilcoxon rank-sum test (Mann-Whitney) is used for independent samples. If the normality assumption is not unreasonable, then these tests are less powerful than the parametric counterparts. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-15 consent of McGraw Hill. 20-15 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (2) In matched-pairs sampling, the parameter of interest is the median difference, 𝑚𝐷. 𝐷 = 𝑋 − 𝑌 for random variables 𝑋 and Y that are a matched-pair. The Wilcoxon signed-rank test for a matched-pairs sample is nearly identical to its use for a single sample. The only added step is that we first find the difference between each pairing. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-16 consent of McGraw Hill. 20-16 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (3) Example: The Growth and Value observations from the introductory case. 𝐷 = 𝐺𝑟𝑜𝑤𝑡ℎ − 𝑉𝑎𝑙𝑢𝑒 𝐻0 : 𝑚𝐷 = 0; 𝐻𝐴 : 𝑚𝐷 ≠ 0 BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-17 consent of McGraw Hill. 20-17 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (4) Example, continued with R. The p-value is 0.2324. Do not reject the null hypothesis. At the 5% significance level, we cannot conclude that the median difference between the returns differs from 0. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-18 consent of McGraw Hill. 20-18 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (5) The Wilcoxon rank-sum test is used when the underlying populations are nonnormal and the samples are independent. The parameter of interest is the difference between two population medians 𝑚1 − 𝑚2. The test is based on rankings, but rather than ranking differences, it ranks the value of each observation from the combined data of both samples. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-19 consent of McGraw Hill. 20-19 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (6) The test statistics is defined as 𝑊 = 𝑊1. 𝑊1 is the sum of the ranks of the values in sample 1. Calculations to obtain the test statistic. 1. Pool the 𝑛1 observations from sample 1 with the 𝑛2 observations from sample 2. Arrange all the data into ascending order. Treat the independent samples as if they are one large group of size 𝑛 = 𝑛1 + 𝑛2. 2. Rank the observations from smallest to largest (ties are the average of the ranks). 3. Sum the ranks of the observations in each group to get 𝑊1 and 𝑊2. 𝑛1 +𝑛2 𝑛1 +𝑛2 +1 𝑊1 + 𝑊2 should be equal to. 2 BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-20 consent of McGraw Hill. 20-20 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (7) If the median salaries are equal, then we would expect each group to produce as many low ranks as high ranks. 𝑊1 and 𝑊2 should be about the same. We can analyze either 𝑊1 and 𝑊2 , we use 𝑊1. There are two cases to consider. 1. 𝑛1 ≤ 10 and 𝑛2 ≤ 10 Use special tables to find the p-value. Use statistical software such as R. 2. 𝑛1 ≥ 10 and 𝑛2 ≥ 10 The the sampling distribution of 𝑊 can be approximated by the normal distribution. 𝑛1 +𝑛2 𝑛1 +𝑛2 +1 𝑛1 𝑛2 𝑛1 +𝑛2 +1 𝜇𝑊 = 2 and 𝜎𝑊 = 12 BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-21 consent of McGraw Hill. 20-21 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (8) Example: An undergraduate is trying to decide between business or journalism. She gathers salary data on 10 recent graduates that majored in business and 10 recent graduates that majored in journalism. 𝐻0 : 𝑚1 − 𝑚2 = 0; 𝐻𝐴 : 𝑚1 − 𝑚2 ≠ 0 BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-22 consent of McGraw Hill. 20-22 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (9) Example, continued. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-23 consent of McGraw Hill. 20-23 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (10) Example, continued with R. The p-value reported by R approximately equals 0. Reject the null hypothesis. At the 5% significance level, we can conclude that the median business salary differs from the median journalism salary. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-24 consent of McGraw Hill. 20-24 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.2 Testing Two Population Medians (11) Example, continued. 𝑊 = 94 𝑛1 𝑛1 +𝑛2 +1 10 10+10+1 𝜇𝑊 = = = 105 2 2 𝑛1 𝑛2 (𝑛1 +𝑛2 +1) (10×10)(10+10+1) 𝜎𝑊 = = = 13.2288 12 12 𝑊−𝜇𝑊 149−105 𝑧= = = 3.33 𝜎𝑊 13.2288 The p-value = 2 × 𝑃 𝑍 ≥ 3.33 = 0.0004 The decision and conclusion are the same. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-25 consent of McGraw Hill. 20-25 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.3 Testing Three ore More Population Medians (1) The ANOVA F-test assumes that each variable is normally distributed with the same variance. The Kruskal-Wallis test is a nonparametric alternative to the one-way ANOVA when the assumption of normality or equal variances cannot be validated. It is based on rank and is an extension of the Wilcoxon rank-sum test. It is used for testing the equality of three or more population medians. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-26 consent of McGraw Hill. 20-26 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.3 Testing Three ore More Population Medians (2) The competing hypotheses are below. – 𝐻0 : 𝑚1 = 𝑚2 = ⋯ = 𝑚𝑘 – 𝐻𝐴 : Not all population means are equal 2 12 𝑘 𝑅𝑖 The test statistic is 𝐻 = 𝑛(𝑛+1) σ𝑖=1 𝑛 −3 𝑛+1. 𝑖 – 𝑅𝑖 and 𝑛𝑖 are the rank sum and the size of the ith sample – 𝑘 is the number of populations – 𝑛 = σ 𝑛𝑖 If 𝑛𝑖 ≥ 5 for all i, then 𝐻 can be approximated by a 𝜒 2 distribution with 𝑑𝑓 = 𝑘 − 1. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-27 consent of McGraw Hill. 20-27 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.3 Testing Three ore More Population Medians (3) Calculations to obtain the test statistic. 1. Pool the observations from all 𝑘 samples and then rank the observations from 1 to 𝑛. 2. Calculate a ranked sum 𝑅𝑖 for each of the 𝑘 samples. If medians are the same, we expect the ranked sums to be close to one another. However, if some sums deviate substantially from others, then this is evidence that not all population medians are the same. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-28 consent of McGraw Hill. 20-28 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.3 Testing Three ore More Population Medians (4) Example: monthly sales for store layouts. 𝐻0 : 𝑚1 = 𝑚2 = 𝑚3 = 𝑚4 𝐻𝐴 : Not all population means are equal BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-29 consent of McGraw Hill. 20-29 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.3 Testing Three ore More Population Medians (5) Example, continued. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-30 consent of McGraw Hill. 20-30 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.3 Testing Three ore More Population Medians (6) Example, continued. 2 12 𝑅 12 𝐻= σ𝑘𝑖=1 𝑖 −3 𝑛+1 =ቆ × (330.0417 + 𝑛(𝑛+1) 𝑛𝑖 23 23+1 238.05 + 1560.0357 + 1711.25)ቇ − 3(23 + 1) = 11.465. 𝑑𝑓 = 4 − 1 = 3 The p-value is 𝑃 𝜒32 ≥ 11.465 = 0.009 Reject the null hypothesis. At the 5% significance level, we can conclude not all median sales across different layouts are the same. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-31 consent of McGraw Hill. 20-31 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.3 Testing Three ore More Population Medians (7) Example, continued with R. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-32 consent of McGraw Hill. 20-32 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.4 The Spearman Rank Correlation Test (1) The Pearson correlation coefficient measures the direction and strength of the linear relationship between two random variables. The Spearman rank correlation coefficient also measures the correlation between two random variables. It falls between −1 and +1, and it is interpreted in a similar way. The Spearman rank correlation coefficient is based on the ranked observations for each variable rather than the raw data. Spearman rank correlation test serves as a nonparametric alternative when the normality assumption does not hold. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-33 consent of McGraw Hill. 20-33 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.4 The Spearman Rank Correlation Test (2) The sample Spearman rank correlation coefficient is 6 σ 𝑑𝑖2 𝑟𝑆 = 1 − where 𝑑𝑖 is the difference between the 𝑛 𝑛2 −1 ranks of observations 𝑥𝑖 and 𝑦𝑖. For a test of 𝐻0 : 𝜌𝑠 = 0; 𝐻𝐴 : 𝜌𝑠 ≠ 0 there are two scenarios. 𝑛 ≤ 10: use special tables to find the p-value or R. 𝑛 ≥ 10: the sampling distribution of 𝑟𝑆 can be approximated by the normal distribution. 1 – Zero mean and standard deviation of 𝑛−1 – The test statistic is computed as 𝑧 = 𝑟𝑆 𝑛 − 1 BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-34 consent of McGraw Hill. 20-34 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.4 The Spearman Rank Correlation Test (3) Calculations for to obtain the sample Spearman rank correlation coefficient. A. Rank the observations for X from smallest to largest, then rank the observations for Y from smallest to largest. B. Calculate the difference 𝑑𝑖 between the ranks of each pair of observations. C. Sum the squared differences σ 𝑑𝑖2. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-35 consent of McGraw Hill. 20-35 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.4 The Spearman Rank Correlation Test (4) Example: returns for Growth and Value funds. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-36 consent of McGraw Hill. 20-36 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.4 The Spearman Rank Correlation Test (5) Example, continued. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-37 consent of McGraw Hill. 20-37 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.4 The Spearman Rank Correlation Test (6) Example, continued. 6 σ 𝑑𝑖2 6×32 𝑟𝑆 = 1 − =1− = 0.8061 𝑛 𝑛2 −1 10× 102 −1 𝐻0 : 𝜌𝑠 = 0; 𝐻𝐴 : 𝜌𝑠 ≠ 0 𝑧 = 𝑟𝑆 𝑛 − 1 = 0.8061 10 − 1 = 2.42 The p-value is 2 × 𝑃 𝑍 ≥ 2.42 = 0.0156 Reject the null hypothesis. We can conclude the Spearman rank correlation coefficient between the Growth and the Value funds differs from zero. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-38 consent of McGraw Hill. 20-38 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.4 The Spearman Rank Correlation Test (7) Example, continued with R. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-39 consent of McGraw Hill. 20-39 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.4 The Spearman Rank Correlation Test (8) BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-40 consent of McGraw Hill. 20-40 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.5 The Sign Test (1) In some applications, matched-pairs sample originates from a categorical variable of ordinal observations rather than numerical (interval- or ratio-scaled). With ordinal-scaled observations, we are able to categorize and rank the observations with respect to a trait. But we cannot interpret the difference between the ranked observations because the actual numbers are arbitrary. If we have a matched-pairs sample of ordinal-scaled observations, we can use the sign test to determine whether there are significant differences between the populations. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-41 consent of McGraw Hill. 20-41 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.5 The Sign Test (2) When applying the sign test, we are only interested in whether the difference between two observations in a pair is different from, greater than, or less than zero. The difference between each pairing is replaced by a (+) or (–) sign. – Plus sign (+) if the difference is positive (that is, the first observation exceeds the second observation value). – Minus sign (−) if the difference between the pair is negative. – If the difference between the pair is zero, we discard that particular outcome from the sample. If significant differences do not exist between the two populations, then we expect just as many plus signs as minus signs. Equivalently, we expect plus signs 50% of the time and minus signs 50% of the time. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-42 consent of McGraw Hill. 20-42 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.5 The Sign Test (3) Let 𝑝 denote the population proportion of plus signs. 𝑃ത = 𝑋/𝑛 be the estimator of the population proportion of plus signs. ҧ 𝑝−0.5 The test statistic is computed as 𝑧 = 0.5/ 𝑛 𝑝ҧ = 𝑥/𝑛 represents the sample proportion of plus signs. The test is valid when 𝑛 ≥ 10. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-43 consent of McGraw Hill. 20-43 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.5 The Sign Test (4) Example: A large pizza chain claims that its reformulated recipe for pizza is a vast improvement over the old recipe. Suppose 20 customers are asked to sample the old recipe and then sample the new recipe. Each person is asked to rate the pizzas on a 5- point scale, where 1 = inedible and 5 = very tasty. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-44 consent of McGraw Hill. 20-44 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.5 The Sign Test (5) Example, continued. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-45 consent of McGraw Hill. 20-45 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.5 The Sign Test (6) Example, continued. Let 𝑝 denote the population proportion of people who prefer the old recipe 𝐻0 : 𝑝 ≥ 0.5; 𝐻𝐴 : 𝑝 < 0.5 4 The sample proportion of plus signs is 𝑝ҧ = = 0.22 18 ҧ 𝑝−0.5 0.22−0.5 The test statistic is 𝑧 = = = −2.357 0.5/ 𝑛 0.5/ 18 The p-value is 𝑃 𝑍 ≤ −2.357 =0.0092 Reject the null hypothesis We can conclude that customers prefer the new recipe as compared to the old one at the 5% significance level. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-46 consent of McGraw Hill. 20-46 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.6 Test Based on Runs (1) In many applications, we wish to determine whether some observations occur in a truly random fashion or whether some form of a nonrandom pattern exists. In other words, we want to test if the elements of the sequence are mutually independent. We use the Wald-Wolfowitz runs test (runs test) to examine whether the elements in a sequence appear in a random order. It can be applied to either categorical or numerical variables so long as we can separate the sample observations into two categories. With numerical variables, define categories as above or below the median. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-47 consent of McGraw Hill. 20-47 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.6 Test Based on Runs (2) Example: A machine fills 16-ounce cereal boxes. A machine is unlikely to dispense exactly 16 ounces in each box. We expect the weight of each box to deviate from 16 ounces. A machine is operating properly if the deviations from 16 ounces occur in a random order. Sample 30 cereal boxes and denote those boxes that are overfilled with the letter O and those that are underfilled with the letter U. OOOOUUUOOOOUOOOUUUUOOOOUUOOOOO BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-48 consent of McGraw Hill. 20-48 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.6 Test Based on Runs (3) Example, continued. One possible way to test whether or not a machine is operating properly is to determine if the elements of a particular sequence of O’s and U’s occur randomly. If we observe a long series of consecutive O’s (or U’s), then the machine is likely overfilling (or underfilling) the cereal boxes. For a runs test, the competing hypotheses are below. – 𝐻0 : The elements occur randomly – 𝐻𝐴 : The elements do not occur randomly BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-49 consent of McGraw Hill. 20-49 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.6 Test Based on Runs (4) A run as an uninterrupted sequence of one letter, symbol, or attribute (e.g. O or U). OOOOUUUOOOOUOOOUUUUOOOOUUOOOOO – Number of 0 runs: 𝑅0 = 5 – Number of U runs: 𝑅𝑈 = 4 We are interested in the number of runs. 𝑅 = 𝑅0 + 𝑅𝑈 = 4 + 5 = 9 Are 9 runs consisting of 30 observations too few or too many? The runs test is a two-tailed test: Too many runs are deemed just as unlikely as too few runs. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-50 consent of McGraw Hill. 20-50 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.6 Test Based on Runs (5) Let 𝑛1 and 𝑛2 be the number of O and U runs. 𝑅 is the number of runs. If 𝑛1 ≥ 10 and 𝑛2 ≥ 10, the distribution of 𝑅 can be approximated by the normal distribution. 2𝑛1 𝑛2 – 𝜇𝑅 = +1 𝑛 2𝑛1 𝑛2 (2𝑛1 𝑛2 −𝑛) 𝜎𝑅 = 𝑛2 (𝑛−1) – 𝑛 = 𝑛1 + 𝑛2 The test statistic for the Wald-Wolfowitz runs test 𝑅−𝜇𝑅 is 𝑧 =. 𝜎𝑅 BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-51 consent of McGraw Hill. 20-51 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.6 Test Based on Runs (6) Example: A machine fills 16-ounce cereal boxes. OOOOUUUOOOOUOOOUUUUOOOOUUOOOOO 𝑅 = 9, 𝑛1 = 𝑛𝑂 = 20, 𝑛2 = 𝑛𝑈 = 10, 𝑛 = 30 2𝑛1 𝑛2 2(20)(10) 𝜇𝑊 = +1= + 1 = 14.33 𝑛 30 2𝑛1 𝑛2 (2𝑛1 𝑛2 −𝑛) (2×20×10) (2×20×10−30) 𝜎𝑅 = = = 2.3813 𝑛2 (𝑛−1) 302 (30−1) 𝑅−𝜇𝑅 9−14.3333 𝑧= = = −2.24 𝜎𝑅 2.3813 The p-value is 2 × 𝑃 𝑍 ≤ −2.24 = 0.0251 Reject the null hypothesis. At the 5% significance level, the pattern is not random. We can conclude the machine does not properly fill boxes. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-52 consent of McGraw Hill. 20-52 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.6 Test Based on Runs (7) Example: growth rate of GPD. The median grown rate is 2.68% Analyze runs above and below the median 𝐻0 : The GDP growth rate is random 𝐻𝐴 : The GDP growth rate is not random BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-53 consent of McGraw Hill. 20-53 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill. 20.6 Test Based on Runs (8) Example, continued with R. BUSINESSCopyright © 2022 COMMUNICATING STATISTICS: McGraw Hill. All rights reserved. No reproduction WITH NUMBERS, 4eor| distribution Jaggia, Kellywithout the prior written 20-54 consent of McGraw Hill. 20-54 Copyright © 2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGra w Hill.

Use Quizgecko on...
Browser
Browser