Statistical Inference - Comparing Two Means PDF
Document Details
Uploaded by ExceedingChrysoprase7632
Monash University
Tags
Summary
This document discusses statistical inference, focusing on comparing two means using confidence intervals for independent samples. It explains the conditions for using t-procedures with two independent samples and includes examples related to exercise and pulse rates. It clearly outlines the concepts and steps involved.
Full Transcript
Statistical Inference – Comparing Two Means Confidence Interval for Two Independent Sample Problems 2 Two-Independent-Sample Problems Draw a random sample of...
Statistical Inference – Comparing Two Means Confidence Interval for Two Independent Sample Problems 2 Two-Independent-Sample Problems Draw a random sample of size n1 from a Normal population with unknown mean µ1 and unknown σ1 and draw an independent random sample of size n2 from another Normal population with unknown mean µ2 and unknown σ2. 1. We have two independent random samples, from two distinct populations or treatment groups. Unlike the matched pairs designs studied earlier, there is no matching of the individuals in the two samples. The two samples are assumed to be independent and can be of different sizes. That is, one sample has no influence on the other -- matching violates independence. We measure the same variable for both samples. 2. Both populations are Normally distributed. The means and standard deviations of the populations are unknown. In practice, it is enough that the distributions have similar shapes and that the data have no strong outliers. Here is how we describe the two populations: And how we describe the two samples: Sample Standard Populatio Sample Sample Population Variable Mean standard deviation n size mean deviation 1 1 2 2 3 Paired and Independent Samples: Difference Paired samples: Independent samples: * Independent (unpaired) e.g. Repeated measures of vitamin C samples - Comparing the e.g. Compare average heights in responses to two treatments or contents of 7 tomatoes.: males and females: comparing the characteristics of Before After d=B-A Male Female two populations. We have a separate sample from each 1 treatment or each population. 2 3 7 One sample of differences! Thus, one-sample statistical inference can be utilised. Two distinct groups! 4 Statistics and Conditions for Independent Samples Conditions for use of t-procedures for two independent samples: Except in the case of small samples, the assumption that each sample is an independent RANDOM SAMPLE from the population of interest is more important than the assumption that the two population distributions are Normal. Small sample sizes (n1 + n2 < 15): Each data set must be close to Normal (symmetric, single peak, no outliers). If a data set is skewed or if outliers are present, do not use t. Medium sample sizes (n1 + n2 ≥ 15): OK except in the presence of outliers or strong skewness in a data set. Large samples (roughly n1 + n2 ≥ 40): The t procedures can be used even for clearly skewed distributions when the sample sizes are large. 5 Confidence Interval for Independent Samples Leave this for software to do … 2) Use the conservative approach when calculating by hand df = smaller of (n1 – 1) or (n2 – 1) 1) The full df approximation gives more accurate results (less error implied) than when simply using the conservative approach of the lesser of n1 – 1 and n2 – 1. 6 Example - Exercise and Pulse Rates Part 1 (CI) A study was performed to compare the mean resting pulse Alternatively, rate of adult subjects who regularly exercise to the mean resting pulse rate of those who do not regularly exercise. full degree of freedom calculation that would be computed or by Excel software would be =TDIST(│t│,df, tails) Do these two populations differ in their mean resting pulse rates? Summary statistics (sample data) from the comparative experiment were: n Sample Mean Sample St Dev Non-exercisers 31 75 9.0 Exercisers 29 66 8.6 full degree of freedom df = 58. This degree of freedom is NOT on the table C, we round down to the next lowest. So, Find a 95% confidence interval for the difference in df = 50. population means: define direction as non-exercisers (1) Multiplier t* = 2.009. minus exercisers (2). Large samples (n1 + n2 = 60), we can use t-procedure, conservative df = n2 – 1 = 29 – 1 = 28. The corresponding multiplier t* = 2.048. 7 Example - Exercise and Pulse Rates Part 2 (CI) A 95% confidence interval for the difference in population means: define direction as non-exercisers (1) minus exercisers (2). Interpretation in context: We are 95% confident that the difference in mean resting pulse rates (non-exercisers minus exercisers) is between 4.435 bpm and 13.565 bpm. Can a difference be claimed? Since 0 is not in the confidence interval, we can claim there is difference in mean resting pulse rates between non-exercisers and exercisers.