Central Limit Theorem PDF

Summary

This document explains the central limit theorem, a concept in statistics. It describes how the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the underlying population distribution. The implications of sample size on the accuracy of estimations are also discussed.

Full Transcript

Business Statistics for Entrepreneurs Ananth Krishnamurthy Module 4 Central Limit Theorem What...

Business Statistics for Entrepreneurs Ananth Krishnamurthy Module 4 Central Limit Theorem What happens when the population from which we are selecting the random sample does not have a normal distribution. First, realize that the expectation of x bar will be mu of the population's distribution, and the standard error of x bar will be sigma divided by root n. The real question is what is the distribution? We know it is some distribution with the expectation and the standard error that we just mentioned, but what distribution? Here is where the central limit theorem gives us an important result. The central limit theorem is helpful in determining the shape of the sampling distribution of x bar. The statement of the central limit theorem as it applies to the sampling distribution of x bar says the following. When you select random samples of size n from the population, the sampling distribution of the sample mean x bar can be approximated by the normal distribution if the sample size becomes large. It is normal regardless of the population distribution. No, it can be approximated by the normal if the sample size is large enough. Okay, it can be approximated by a normal. Yes, let's see a picture of what is happening. This figure shows how the central limit theorem works for four different populations. Each column refers to one of the populations. The top panel shows the population distribution of x in the population. In Basaraju's case, remember the distribution of the x is the distribution of the scores in the population which has mean mu and standard deviation sigma. In the panel, the first picture corresponds to the normal. The second is uniform. The third is skewed like the exponential. And the fourth one is some arbitrary distribution. Let's call it the purple distribution. The bottom panels show the shape of the sampling distribution of X bar. The sample averages that we know has mean mu and standard error sigma divided by root n. When the sample size n is equal to 2, the shape of the distribution X bar is not the shape of the population, nor is it the shape of the normal. But when n increases to beyond 30, we see that the shape of the sampling distribution of x bar begins to look very similar to the shape of a normal distribution. © All Rights Reserved. This document has been authored by Ananth Krishnamurthy and is permitted for use only within the course "Business Statistics for Entrepreneurs" delivered in the online course format by IIM Bangalore. No part of this document, including any logo, data, illustrations, pictures, scripts, may be reproduced, or stored in a retrieval system or transmitted in any form or by any means – electronic, mechanical, photocopying, recording or otherwise – without the prior permission of the author. Business Statistics for Entrepreneurs Ananth Krishnamurthy Module 4 Is it 30 always? Why at n equals to 30? What is so special? From a practical standpoint, we want to know how big the sample size should be before the center limit theorem approximation is good enough. Several researchers have investigated this question by studying the sampling distribution of x bar for a variety of populations and a variety of sample sizes. The general practice is to assume that for most applications, the sampling distribution of x bar can be approximated by a normal distribution whenever the sample size is 30 or more. Note that n is equal to 30 or more is an empirically derived number that works in most cases. While it is theoretically proved that the distribution of x bar converges to the normal as n goes to infinity, it has been empirically observed that even at n equal to 30, the distribution of x bar starts to look like the normal in most cases. Of course, if you want, you can always find distributions for the population, especially ones with high skewness. Where even with sample sizes of n more than 150 or more, the sampling distribution may not appear to look like the normal. But this does not happen for most distributions of the population. But how come the population distribution becomes normal for large samples? No Tejas, I never said the population distribution becomes normal for large samples. Isn't that what Central Limit Theorem says? No, let me repeat what it says. When you select random samples of size n from the population, the sampling distribution of the sample mean x bar can be approximated by the normal distribution if the sample size becomes large. The central limit theorem is talking about the distribution of x bar and not of the population. Yes, and that is an important point. We are talking about the distribution of X bar and not the distribution of X in the population. Clearly, just because you sample more than 30 from a population, the distribution of the population cannot change. However, the shape of the distribution of X bar, the random variable, starts to look more like the normal. Okay, I think I get it. It will take some time to absorb this. Yes, this requires some thinking through about what is it that the Central Limit Theorem is saying and what it is not saying. But let's see what the Central Limit Theorem or CLT as it is often called in short, let's see what it means for Basavaraju's survey. © All Rights Reserved. This document has been authored by Ananth Krishnamurthy and is permitted for use only within the course "Business Statistics for Entrepreneurs" delivered in the online course format by IIM Bangalore. No part of this document, including any logo, data, illustrations, pictures, scripts, may be reproduced, or stored in a retrieval system or transmitted in any form or by any means – electronic, mechanical, photocopying, recording or otherwise – without the prior permission of the author. Business Statistics for Entrepreneurs Ananth Krishnamurthy Module 4 First, Basavaraju realizes that he cannot expect the value of the sample mean x bar that he has found to be equal to 57. 31 to be exactly equal to the population mean mu. The reason we are interested in the sampling distribution of x bar is so that it can be used to provide some probability information about the difference between the sample mean x bar and the population mean mu. We have learned so far that the sampling distribution of x bar is normal and Basaraju has a sample average x bar equal to 57. 31 and the sample standard deviation to be equal to 6. 38 for a sample size of 36. Now, let's say that Basaraju believes that the sampling process and the sample mean estimate will be really trustworthy for his own decision making. If the sample average x bar is within plus or minus two of the unknown population mean mu, however, it is not possible to guarantee that the sample mean x bar will always be within plus or minus two of the population mean mu because x bar is a random variable. So instead, we will determine what is the probability that the sample mean x bar computed from the survey of n equal to 36 customers will be within plus or minus 2 of the population mean. Because n is equal to 36 and it is larger than 30, we can apply the central limit theorem to say that the sampling distribution of x bar is approximately normal. Further, we know that the expectation of x bar is mu, and we also know that the sigma x bar, the standard error, is sigma divided by root n, where n is 36. Now, if we go with Basaraju's estimate of sigma from last year's survey, what is the value of sigma x bar, Tejas? Sigma X bar is 6 divided by 6, or 1. Yes, Then we want to determine what is the probability that the normal random variable x bar minus mu lies within plus or minus 2. To calculate this, we transform this question into a probability question related to the standard normal random variable by dividing everywhere by the standard error of x bar, sigma divided by root n, which coincidentally is 1 in this case. So, what do we want? It seems we just want the probability of a standard normal taking values between minus 2 and 2. Yes, and that probability is 0. 95. Thus, there is a 95 percent chance that Basaraju's x bar of 57. 3 is within plus or minus 2 of the population mean. Alternatively, there is only a 5 percent chance that this difference between x bar and mu will be more than 2 units. That is interesting. With just one sample of 36 responses, he is able to © All Rights Reserved. This document has been authored by Ananth Krishnamurthy and is permitted for use only within the course "Business Statistics for Entrepreneurs" delivered in the online course format by IIM Bangalore. No part of this document, including any logo, data, illustrations, pictures, scripts, may be reproduced, or stored in a retrieval system or transmitted in any form or by any means – electronic, mechanical, photocopying, recording or otherwise – without the prior permission of the author. Business Statistics for Entrepreneurs Ananth Krishnamurthy Module 4 come up with a pretty good estimate of population mu that it is within plus or minus 2 of his 57. 31, 91 percent of the time. Imagine what would have been the case if he had got 100 responses instead of 36. Why imagine? Let's compute what would have been the case if n was indeed 100. Suppose that Basavaraju had obtained responses from 100 customers instead of the 36 responses that he got. Intuitively, it would seem that with more data provided by the larger sample size, The sample mean based on an n equal to 100 should provide a better estimate of the population mean mu than the sample mean obtained based on n equal to 36. Question is, how much better? What is the relationship between the sample size n and the sampling distribution of x bar? Note that the expectation of x bar, E of x bar, is equal to mu regardless of the sample size. It does not change with sample size n. This means the x bar would always come from a distribution centered at mu, no matter how large or small the value of n is. Yes, however, the standard error of the mean sigma x bar is given by sigma divided by root n, and it has the square root of the sample size in the denominator. So, whenever the sample size n is increased, the standard error of the mean sigma x bar will decrease. With n equal to 36, the standard error of the mean was 6 over the square root of 36 equal to 1, if we used Basavaraju's last year's estimate of sigma of 6. However, with the increase in sample size to n equal to 100, the standard error of the mean will decrease to 6, divided by the square root of 100, which would be 0. 1. Both the sampling distributions of x bar with n equal to 36 and n equal to 100 are shown here. They are both normal centered at mu. Yes, but because the sampling distribution with n equal to 100 has a smaller standard error, the values of x bar will have less variation and will tend to be closer to the population mean than the values of x bar when n is equal to 36. So now let's repeat our calculation to find the probability that our sample mean X bar lies between plus or minus 2 of the population mean when n is actually equal to 100. So Tejas, what is the probability that we need? We want probability of a normal random variable X bar minus mu taking values between plus or minus 2. © All Rights Reserved. This document has been authored by Ananth Krishnamurthy and is permitted for use only within the course "Business Statistics for Entrepreneurs" delivered in the online course format by IIM Bangalore. No part of this document, including any logo, data, illustrations, pictures, scripts, may be reproduced, or stored in a retrieval system or transmitted in any form or by any means – electronic, mechanical, photocopying, recording or otherwise – without the prior permission of the author. Business Statistics for Entrepreneurs Ananth Krishnamurthy Module 4 Yes, and if we translate it into the corresponding standard normal. What do we get? We want the probability of a standard normal random variable taking values between minus 2 divided by 0. 6 and plus 2 divided by 0. 6. Yes, and that comes out to be the probability that Z lies between minus 3. 33 and plus 3. 33. This probability is greater than 0. 999. So, there is a 99. 9 percent guarantee that if Basaraju had conducted the survey and got 100 responses, other things remaining the same, his sample mean would have been within plus or minus 2 of the population mean. And there is only a 0. 1 percent chance that it lies outside the plus or minus 2 range. It is really interesting that as the sample size increases, the standard error of the mean decreases. It is intuitive, but it is good to see the effect of n, so the square root of n, in the denominator of the standard error. Yes, and it is and as a result, the larger the sample size, the higher the probability that the sample mean you obtain from a single survey will be within a specified distance of the population mean. Let's now do a few examples to get a better feel of what we just covered. © All Rights Reserved. This document has been authored by Ananth Krishnamurthy and is permitted for use only within the course "Business Statistics for Entrepreneurs" delivered in the online course format by IIM Bangalore. No part of this document, including any logo, data, illustrations, pictures, scripts, may be reproduced, or stored in a retrieval system or transmitted in any form or by any means – electronic, mechanical, photocopying, recording or otherwise – without the prior permission of the author.

Use Quizgecko on...
Browser
Browser