Document Details

GentlestPearTree

Uploaded by GentlestPearTree

UAEU

Tags

sampling distributions statistics data analysis probability

Summary

This document is a unit on sampling distributions in statistics. It includes the basic concepts, examples of how to calculate a statistic, and associated exercises. The document is geared towards university-level undergraduate students.

Full Transcript

Unit 6 Sampling Distributions CS 40003: Data 2 Analytics In this Unit…  Basic concept of sampling distribution  Usage of sampling distributions  Sampling d...

Unit 6 Sampling Distributions CS 40003: Data 2 Analytics In this Unit…  Basic concept of sampling distribution  Usage of sampling distributions  Sampling distribution of the mean  Central limit theorem  Application of Central limit theorem 3 Basic Concepts  A population is the collection of all items or things under consideration -people or objects.  A sample is a portion of the population selected for analysis.  A parameter is a summary measure that describes a characteristic of the population. X  Examples: The population mean and standard deviation; are parameters.  A statistic is a summary measure computed from a sample.  Examples: The sample mean and standard deviation; x -bar and s are statistics. 4 Example According to census in 2007 5.21% of a certain population were suffering from certain kind of depression. A year later 5000 individuals were selected from this population, 6% were suffering from depression. (a) What is the population of interest? (b) What is the sample? (c) What is the parameter? (d) What is the statistic? (e) Does the value 5.21% refer to the parameter or the statistic? (f) Is the value 6% a parameter or a statistics? CS 40003: Data 5 Analytics Statistical Inference As a task of statistical inference, we usually follow the following steps:  Data collection  Collect a sample from the population.  Statistics  Compute a statistics from the sample.  Statistical inference  From the statistics we made various statements concerning the values of population parameters.  For example, population mean from the sample mean, etc. 6 Why Sampling Distributions?  The basic thrust of inferential statistics is drawing conclusions regarding the levels of populations parameters.  A statistic is computed from the sample and varies from sample to sample.  A statistic is a random variable and has a probability distribution called the sampling distribution.  The probability distribution changes when the population parameter changes, that is the behavior of the sample statistic reflects the truth about the population. 7  Population mean: Sample Mean: Review μ=  x i x=  x i N n where: μ = Population mean x = sample mean xi = Values in the population or sample N = Population size n = sample size 8 Developing a Sampling Distribution  Assume there is a population … Population size N=4 C D  A B  Random variable, x, is age of individuals  Values of x: 18, 20, 22, 24 (years) 9 Developing a Sampling Distribution Summary Measures for the Population Distribution: =  x i P(x) N.3 18 + 20 + 22 + 24 = = 21.2 4.1  (x 0 − ) 2 x = = 2.236 i 18 20 22 24 N A B C D Uniform Distribution 10 Developing a Sampling Distribution Now consider all possible samples of size n=2 1st 2nd Observation Obs 18 20 22 24 16 Sample 18 18,18 18,20 18,22 18,24 Means 20 20,18 20,20 20,22 20,24 1st 2nd Observation 22 22,18 22,20 22,22 22,24 Obs 18 20 22 24 24 24,18 24,20 24,22 24,24 18 18 19 20 21 20 19 20 21 22 16 possible samples (sampling with 22 20 21 22 23 replacement) 24 21 22 23 24 11 Developing a Sampling Distribution Sampling Distribution of All Sample Means 16 Sample Means Sample Means Distribution 1st 2nd Observation Obs 18 20 22 24 P(x).3 18 18 19 20 21.2 20 19 20 21 22.1 22 20 21 22 23 0 _ 24 21 22 23 24 18 19 20 21 22 23 24 x (no longer uniform) 12 Developing a Sampling Distribution  Summary Measures of this Sampling Distribution: x =  x i = 18 + 19 + 21 + + 24 = 21 N 16 x =  (x i − x )2 N (18 - 21) 2 + (19 - 21) 2 + + (24 - 21) 2 = = 1.58 16 13 Comparing the Population with its Sampling Distribution Population Sample Means Distribution N=4 n=2 μ = 21 σ = 2.236 μ x = 21 σ x = 1.58 P(x) P(x).3.3.2.2.1.1 0 x 0 18 19 20 21 22 23 24 _ 18 20 22 24 x A B C D 14 Sampling Distribution of the Sample Mean  The mean of the sampling distribution of the sample mean is equal to the mean of the population. That is X =   The standard deviation of the sampling distribution of the sample mean is  X = n 15 Shape of the sampling distribution  The shape of the sampling distribution of the sample mean relates to the following two cases:  The population from which samples are drawn has a normal distribution  That population does not have a normal distribution  Check the following experiment: Demonstration 16 Sampling Distribution of the Sample Mean  Case 1: Sampling from normal population  The sampling distribution of the sample mean is normal whatever the value of n.  Case2: Sampling from non-normal population (Central limit theorem (CLT)):  For a large sample size, the sampling distribution of the sample mean is approximately normal, irrespective of the shape of the population distribution. 17 Effect of Sample Size  The larger the sample size, the more nearly normally distributed is the population of all possible sample means.  Also, as the sample size increases, the spread of the sampling distribution decreases. 18 How Large is Large Enough? A good rule-of-thumb is that “n is sufficiently large” provided that “n≥30”.  For fairly symmetric distributions, n > 15.  Itis understood that the approximation improves as the sample size increases.  For normal population distributions, the sampling distribution of the mean is always normally distributed. 19 X Sampling Distribution of  2  X is distributed as N   ,   n  20 Example  The scores of students on the ACT college entrance examination in a recent year had the normal distribution with mean =18.6 and standard deviation  = 5.9. Take a SRS of 50 students who took the test. What is the probability that the mean score of these 50 students is 21 or higher?  5.9  x = 18.6 and  x = = = 0.83 n 50  21 − 18.6  P( X  21) = P Z   = P( Z  2.89) = 0.0019  0.83  21 Exercises 1. The length of response time to a source a student takes is normally distributed with a mean of 2.40 seconds and a standard deviation of 0.15 seconds. a) What is the probability that a selected student will take more than 2.50 seconds? b) What is the probability that the mean time of 9 randomly selected students is more than 2.50 seconds? c) What is the probability that the mean time of 36 randomly selected students is more than 2.50 seconds? d) If the population of times is not normally distributed, which, if any, of the questions above can you answer? Justify your answer. 22 Exercises 2. The amount of time spent by adults playing sports per day is normally distributed with a mean of 4 hours and standard deviation of 1.25 hours. a) Find the probability that a randomly selected adult plays sports for more than 5 hours per day b) Find the probability that if four adults are randomly selected, their average number of hours spent playing sports is more than 5 hours per day. c) Find the probability that if four adults are randomly selected, all four play sports for more than 5 hours per day.

Use Quizgecko on...
Browser
Browser