Simple Random Sampling PDF
Document Details
IIT Kanpur
Shalabh
Tags
Summary
This document provides a detailed explanation of simple random sampling methods utilized in statistical analysis. The document covers various types of sampling, such as SRSWOR and SRSWR, and the procedures involved in selecting a random sample. Notations and formulas for estimating population mean and variance are included.
Full Transcript
Chapter -2 Simple Random Sampling Simple random sampling (SRS) is a method of selection of a sample comprising of n a number of sampling units out of the population having N number of sampling units such that every sa...
Chapter -2 Simple Random Sampling Simple random sampling (SRS) is a method of selection of a sample comprising of n a number of sampling units out of the population having N number of sampling units such that every sampling unit has an equal chance of being chosen. The samples can be drawn in two possible ways. The sampling units are chosen without replacement because the units, once chosen, are not placed back in the population. The sampling units are chosen with replacement because the selected units are placed back in the population. 1. Simple random sampling without replacement (SRSWOR): SRSWOR is a method of selection of n units out of the N units one by one such that at any stage of selection, any one of the remaining units has the same chance of being selected, i.e., 1/ N. 2. Simple random sampling with replacement (SRSWR): SRSWR is a method of selection of n units out of the N units one by one such that at each stage of selection, each unit has an equal chance of being selected, i.e., 1/ N. Procedure of selection of a random sample: The procedure of selection of a random sample follows the following steps: 1. Identify the N units in the population with the numbers 1 to N. 2. Choose any random number arbitrarily in the random number table and start reading numbers. 3. Choose the sampling unit whose serial number corresponds to the random number drawn from the table of random numbers. 4. In the case of SRSWR, all the random numbers are accepted even if repeated more than once. In the case of SRSWOR, if any random number is repeated, then it is ignored, and more numbers are drawn. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page11 Such a process can be implemented through programming and using the discrete uniform distribution. Any number between 1 and N can be generated from this distribution, and the corresponding unit can be selected in the sample by associating an index with each sampling unit. Many statistical software like R, SAS, etc., have built-in functions for drawing a sample using SRSWOR or SRSWR. Notations: The following notations will be used in further notes: N: Number of sampling units in the population (Population size). n: Number of sampling units in the sample (sample size) Y: The characteristic under consideration Yi : Value of the characteristic of the i th unit of the population 1 n y = yi : sample mean n i =1 1 N Y= yi : population mean N i =1 1 N 1 N S2 = i N − 1 i =1 (Y − Y ) 2 = ( Yi 2 − NY 2 ) N − 1 i =1 1 N 1 N 2 2 == i N i =1 (Y − Y ) 2 = ( Yi − NY 2 ) N i =1 1 n 1 n s2 = n − 1 i =1 ( yi − y ) 2 = ( yi2 − ny 2 ) n − 1 i =1 Probability of drawing a sample : 1. SRSWOR: N If n units are selected by SRSWOR, the total number of possible samples are . n 1 So, the probability of selecting any one of these samples is. N n Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page22 Note that a unit can be selected at any one of the n draws. Let u i be the ith unit selected in the sample. This unit can be selected in the sample either at the first draw, second draw, …, or nth draw. Let Pj (i) denotes the probability of selection of u i at the jth draw, j = 1,2,...,n. Then Pj (i ) = P1 (i ) + P2 (i ) +... + Pn (i ) 1 1 1 = + +... + ( n times) N N N n =. N Now if u1 , u2 ,..., un are the n units selected in the sample, then the probability of their selection is P(u1 , u2 ,..., un ) = P(u1 ).P(u2 ),..., P(un ). Note that when the second unit is to be selected, then there are (n – 1) units left to be selected in the sample from the population of (N – 1) units. Similarly, when the third unit is to be selected, there are (n – 2) units left to be selected in the sample from the population of (N – 2) units and so on. n If P (u1 ) = , then N n −1 1 P(u2 ) = ,..., P(un ) =. N −1 N − n +1 Thus n n −1 n − 2 1 1 P(u1 , u2 ,.., un ) =..... =. N N −1 N − 2 N − n +1 N n Alternative approach: The probability of drawing a sample in SRSWOR can alternatively be found as follows: Let ui ( k ) denotes the ith unit drawn at the kth draw. Note that the ith unit can be any unit out of the N units. Then so = (ui (1) , ui (2) ,..., ui ( n) ) is an ordered sample in which the order of the units in which they are drawn, i.e., ui (1) drawn at the first draw, ui (2) drawn at the second draw and so on, is also considered. The probability of selection of such an ordered sample is P(so ) = P(ui (1) )P(ui (2) | ui (1) )P(ui (3) | ui (1)ui (2) )...P(ui ( n) | ui (1)ui (2)...ui( n−1) ). Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page33 Here P(ui ( k ) | ui (1)ui (2)...ui (k −1) ) is the probability of drawing ui ( k ) at the kth draw given that ui (1) , ui (2) ,..., ui (k −1) have already been drawn in the first (k – 1) draws. Such a probability is obtained as 1 P(ui ( k ) | ui (1)ui (2)...ui ( k −1) ) =. N − k +1 So n 1 ( N − n)! P( so ) = =. k =1 N − k +1 N! The number of ways in which a sample of size n can be drawn = n! ( N − n)! Probability of drawing a sample in a given order =. N! So the probability of drawing a sample in which the order of units in which they are drawn is ( N − n)! 1 irrelevant = n ! =. N! N n 2. SRSWR When n units are selected with SRSWR, the total number of possible samples are N n. 1 The Probability of drawing a sample is. Nn Alternatively, let u i be the ith unit selected in the sample. This unit can be selected in the sample either at the first draw, second draw, …, or nth draw. At any stage, there are always N units in the population in the case of SRSWR, so the probability of selection of u i at any stage is 1/N for all i = 1,2,…,n. Then, the probability of selection of n units u1 , u2 ,..., un in the sample is P(u1 , u2 ,.., un ) = P(u1 ).P(u2 )...P(un ) 1 1 1 =.... N N N 1 = n. N Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page44 Probability of drawing a unit 1. SRSWOR th Let Ae denotes an event that a particular unit u j is not selected at the draw. The probability of selecting, say, j th unit at k th draw is P (selection of u j at k th draw) = P( A1 A2.... Ak −1 Ak ) = P( A1 ) P( A2 A1 ) P( A3 A1 A2 ).....P ( Ak −1 A1 , A2...... Ak −2 ) P ( Ak A1 , A2......Ak −1 ) 1 1 1 1 1 = 1 − 1 − 1 − ... 1 − N N − 1 N − 2 N − k + 2 N − k + 1 N −1 N − 2 N − k +1 1 =..... N N −1 N − k + 2 N − k +1 1 =. N 2. SRSWR 1 P[ selection of u j at kth draw] =. N Estimation of population mean and population variance One of the main objectives after selecting a sample is to know about the tendency of the data to cluster around the central value and the scatteredness of the data around the central value. Among various measures of central tendency and dispersion, the popular choices are arithmetic mean and variance. So, the population mean and population variability are generally measured by the arithmetic mean (or weighted arithmetic mean) and variance, respectively. There are various popular estimators for estimating the population mean and population variance. Among them, sample arithmetic mean and sample variance is more popular than other estimators. One of the reasons for using these estimators is that they possess excellent statistical properties. Moreover, they are also obtained through well-established statistical estimation procedures like maximum likelihood estimation, least squares estimation, method of moments, etc., under several standard statistical distributions. One may also consider other measures like median, mode, geometric mean, harmonic mean for measuring the central tendency and mean deviation, absolute deviation, Pitman nearness, etc., for measuring the dispersion. Numerical procedures like bootstrapping can study the properties of such estimators. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page55 1. Estimation of population mean 1 n Let us consider the sample arithmetic mean y = yi as an estimator of the population mean n i =1 1 N Y= Yi and verify y is an unbiased estimator of Y under the two cases. N i =1 SRSWOR n Let ti = yi. Then i =1 n 1 E( y ) = E ( yi ) n i =1 1 = E ( ti ) n N 1 1 n = ti n N i =1 n N 1 1 n n = yi. n N i =1 i =1 n When n units are sampled from N units without replacement, each unit of the population can occur with other units selected out of the remaining ( N − 1) units in the population, and each unit occurs in N − 1 N the possible samples. So n −1 n N n n N − 1 N So yi = n − 1 yi. i =1 i =1 i =1 Now ( N − 1)! n !( N − n)! N E( y ) = (n − 1)!( N − n)! n N! i =1 yi N 1 = N y i =1 i =Y. Thus y is an unbiased estimator of Y. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page66 Alternatively, the following approach can be adapted to show the unbiasedness property. Let 1 Pj (i ) = denotes the probability of selection of i th unit at j th stage. Then N n 1 E( y ) = n j =1 E( y j ) 1 n N = n Yi Pj (i ) j =1 i =1 1 n N 1 = n Yi. N j =1 i =1 n 1 = n Y. j =1 =Y SRSWR n 1 E( y ) = E ( yi ) n i =1 1 n = E ( yi ) n i =1 1 n = (Y1P1 + Y2 P2 +... + YN PN ) n i =1 1 n 1 1 1 = (Y1 + Y2 +... + YN ) n i =1 N N N 1 n = Y n i =1 =Y. 1 where Pi = for all i = 1, 2,..., N is the probability of selection of a unit. Thus y is an unbiased N estimator of the population mean under SRSWR also. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page77 Variance of the estimate Assume that each observation has some variance 2. Then V ( y ) = E ( y − Y )2 2 1 n = E ( yi − Y ) n i =1 1 n 1 n n = E 2 ( yi − Y ) 2 + 2 ( yi − Y )( y j − Y ) n i =1 n i j n n n 1 1 = 2 E ( yi − Y ) 2 + 2 E ( yi − Y )( y j − Y ) n i =1 n i j 1 n 2 K = + n2 n 2 i =1 N −1 2 K = S + 2 Nn n n n where K = E ( yi − Y )( y j − Y ) assuming that each observation has variance 2. Now we find K i j under the setups of SRSWR and SRSWOR. SRSWOR n n K = E ( yi − Y )( y j − Y ). i j Consider N N 1 E ( yi − Y )( y j − Y ) = ( yk − Y )( yl − Y ). N ( N − 1) k Since 2 N N N N k − k = − + ( yk − Y )( y − Y ) 2 ( y Y ) ( y Y ) k =1 i =1 k N N 0 = ( N − 1) S 2 + ( yk − Y )( y − Y ) k N N 1 1 N ( N − 1) k ( yk − Y )( y − Y ) = N ( N − 1) [−( N − 1) S 2 ] S2 =−. N Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page88 S2 Thus K = − n(n − 1) and so substituting the value of K , the variance of y under SRSWOR is N N −1 2 1 S2 V ( yWOR ) = S − 2 n(n − 1) Nn n N N −n 2 = S. Nn SRSWR N N K = E ( yi − Y )( y j − Y ) i j N N = E ( yi − Y ) E ( y j − Y ) i j =0 because the ith and jth draws (i j ) are independent. Thus, the variance of y under SRSWR is N −1 2 V ( yWR ) = S. Nn It is to be noted that if N is infinite (large enough), then S2 V ( y) = n N −n is the case for both SRSWOR and SRSWR. So the factor is responsible for changing the N variance of y when the sample is drawn from a finite population compared to an infinite population. N −n N −n n This is why is called a finite population correction (fpc). It may be noted that = 1− , N N N N −n n so is close to 1 if the ratio of a sample size to population , is very small or negligible. The N N n n term is called the sampling fraction. In practice, fpc can be ignored whenever 5% and for N N many purposes, even if it is as high as 10%. Ignoring fpc will result in the overestimation of the variance of y. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page99 Efficiency of y under SRSWOR over SRSWR N −n 2 V ( yWOR ) = S Nn N −1 2 V ( yWR ) = S Nn N − n 2 n −1 2 = S + S Nn Nn = V ( yWOR ) + a positive quantity. Thus V ( yWR ) V ( yWOR ) and so, SRSWOR is more efficient than SRSWR. Estimation of variance from a sample Since the expressions of variances of the sample mean, involve S 2 which is based on population values, so these expressions can not be used in real-life applications. To estimate the variance of y on the basis of a sample, an estimator of S 2 (or equivalently 2 ) is needed. Consider s 2 as an estimator of S 2 (or 2 ) and we investigate its biasedness for s 2 in the cases of SRSWOR and SRSWR, Consider 1 n s2 = n − 1 i =1 ( yi − y ) 2 2 1 n = ( yi − Y ) − ( y − Y ) n − 1 i =1 1 n = n − 1 i =1 ( yi − Y ) 2 − n( y − Y ) 2 1 n E (s2 ) = n − 1 i =1 E ( yi − Y ) 2 − nE ( y − Y ) 2 1 n 1 = n − 1 i =1 Var ( yi ) − nVar ( y ) = n −1 n 2 − nVar ( y ) . Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page10 10 In the case of SRSWOR N −n 2 V ( yWOR ) = S Nn and so n 2 N −n 2 E (s 2 ) = − S n − 1 Nn n N −1 2 N − n 2 = S − S n − 1 N Nn = S 2. In the case of SRSWR N −1 2 V ( yWR ) = S Nn and so n 2 N −1 2 E (s 2 ) = − S n − 1 Nn n N −1 2 N −1 2 = S − S n − 1 N Nn N −1 2 = S N =2 Hence S in SRSWOR 2 E (s ) = 2 2 in SRSWR An unbiased estimate of Var ( y ) is N −n 2 Vˆ ( yWOR ) = s in case of SRSWOR and Nn N −1 N 2 Vˆ ( yWR ) =. s Nn N − 1 s2 = in case of SRSWR. n Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page11 11 Standard errors The standard error of y is defined as Var ( y ). To estimate the standard error, one simple option is to consider the square root of the estimate of the variance of the sample mean. N −n under SRSWOR, a possible estimator is ˆ ( y ) = s. Nn N −1 under SRSWR, a possible estimator is ˆ ( y ) = s. Nn It is to be noted that this estimator does not possess the same properties as of Var ( y ). The reason is that if ˆ is an estimator of , then is not necessarily an estimator of . In fact, the ˆ ( y ) is a negatively biased estimator under SRSWOR. The approximate expressions for large N case are as follows: (Reference: Sampling Theory of Surveys with Applications, P.V. Sukhatme, B.V. Sukhatme, S. Sukhatme, C. Asok, Iowa State University Press and Indian Society of Agricultural Statistics, 1984, India) Consider s as an estimator of S. Let s 2 = S 2 + with E ( ) = 0, E ( 2 ) = S 2. Write s = ( S 2 + )1/2 1/2 = S 1 + 2 S 2 = S 1 + 2 − 4 +... 2S 8S assuming will be small compared to S 2 and as n becomes large, the probability of such an event approaches one. Neglecting the powers of higher than two and taking expectation, we have Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page12 12 Var ( s 2 ) E ( s ) = 1 − 4 S 8 S where 2S 4 n − 1 Var ( s ) = 2 1+ ( 2 − 3) ) for large N. (n − 1) 2n j 1 N j = (Yi − Y ) N i =1 4 2 = : coefficient of kurtosis. S4 Thus 1 − 3 E ( s ) = S 1 − − 2 4(n − 1) 8n 2 1 Var ( s 2 ) Var ( s) = S − S 1 − 2 2 4 8 S 2 Var ( s ) = 4S 2 S 2 n −1 = 1+ ( 2 − 3) . 2 ( n − 1) 2n Note that for a normal distribution, 2 = 3 and we obtain S2 Var ( s ) =. 2 ( n − 1) Both Var ( s ) and Var ( s 2 ) are inflated due to nonnormality to the same extent, by the inflation factor n −1 1 + 2n ( 2 − 3) and this does not depend on the coefficient of skewness. This is an important result to be kept in mind while determining the sample size in which it is assumed that S 2 is known. If the inflation factor is ignored and the population is non-normal, then the reliability on s 2 may be misleading. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page13 13 Alternative approach: The results for the unbiasedness property and the variance of the sample mean can also be proved in an alternative way as follows: (i) SRSWOR With the ith unit of the population, we associate a random variable a i defined as follows: 1, if the i unit occurs in the sample th ai = 0, if the i unit does not occurs in the sample (i =1, 2,..., N ). th Then, E (ai ) = 1 Probability that the i th unit is included in the sample n = , i = 1, 2,..., N. N E (ai2 ) = 1 Probability that the i th unit is included in the sample n = , i = 1, 2,..., N. N E (ai a j ) = 1 Probability that the i th and j th units are included in the sample n(n − 1) = , i j = 1, 2,..., N. N ( N − 1) From these results, we can obtain n( N − n) Var (ai ) = E (ai2 ) − ( E (ai ) ) = , i =1, 2,..., N 2 N2 n( N − n) Cov(ai , a j ) = E (ai a j ) − E (ai ) E (a j ) = 2 , i j = 1, 2,..., N. N ( N − 1) We can rewrite the sample mean as 1 N y= ai yi. n i =1 Then 1 N E( y ) = E (ai ) yi = Y n i =1 and 1 N 1 N N Var ( y ) = 2 Var i =1 ai i y = 2 n i =1 Var ( ai ) yi 2 + Cov(ai , a j ) yi y j . n i j Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page14 14 Substituting the values of Var(ai ) and Cov(ai , a j ) in the expression of Var ( y ) and simplifying, we get N −n 2 Var ( y ) = S. Nn To show that E(s2 ) = S 2 , consider 1 n 2 2 1 N s2 = (n − 1) i =1 yi − ny = (n − 1) i =1 ai yi2 − ny 2 . Hence, taking, expectation, we get 1 N E (s 2 ) = E (ai ) yi2 − n Var ( y ) + Y 2 . (n − 1) i =1 Substituting the values of E (ai ) and Var ( y ) in this expression and simplifying it, we get E(s2 ) = S 2. (ii) SRSWR Let a random variable a i associated with the ith unit of the population denotes the number of times the ith unit occurs in the sample i = 1, 2,..., N. So a i assumes values 0, 1, 2,…,n. The joint distribution of a1 , a2 ,..., a N is the multinomial distribution given by n! 1 P(a1 , a2 ,..., aN ) = N. Nn a ! i =1 i N where a i =1 i = n. For this multinomial distribution, we have n E (ai ) = , N n( N − 1) Var (ai ) = , i = 1, 2,..., N. N2 n Cov(ai , a j ) = − 2 , i j = 1, 2,..., N. N We rewrite the sample mean as 1 N y= ai yi. n i =1 Hence, taking the expectation of y and substituting the value of E ( ai ) = n / N we obtain that E( y ) = Y. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page15 15 Further, 1 N N 2 Var ( y ) = Var ( ai ) yi 2 + Cov(ai , a j ) yi y j . n i =1 i =1 Substituting, the values of Var (ai ) = n( N −1) / N 2 and Cov(ai , a j ) = −n / N 2 and simplifying, we get N −1 2 Var ( y ) = S. Nn N −1 2 To prove that E ( s 2 ) = S = 2 in SRSWR, consider N n N (n − 1) s 2 = yi2 − ny 2 = ai yi2 − ny 2 , i =1 i =1 (n − 1) E ( s 2 ) = E (ai ) yi2 − n Var ( y ) + Y 2 N i =1 n N 2 ( N − 1) 2 = N i =1 yi − n. nN S − nY 2 (n − 1)( N − 1) 2 = S , N N −1 2 E (s 2 ) = S = 2. N Estimator of population total: Sometimes, it is also of interest to estimate the population total, e.g., total household income, total expenditures, etc. Let denotes the population total N YT = Yi = NY i =1 which can be estimated by YˆT = NYˆ = Ny. Obviously ( ) E YˆT = NE ( y ) = NY Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page16 16 ( ) Var YˆT = N 2Var ( y ) 2 N − n 2 N ( N − n) 2 N Nn S = S for SRSWOR n = N 2 N − 1 S 2 = N ( N − 1) S 2 for SRSWOR, Nn n and the estimates of variance of YˆT are N ( N − n) 2 s for SRSWOR n Var (YˆT ) = N s2 for SRSWOR. n Confidence limits for the population mean Now we construct the 100 (1 − ) % confidence interval for the population mean. Assume that the y −Y population is normally distributed N ( , 2 ) with mean and variance 2. then follows Var ( y ) y −Y N (0,1) when 2 is known. If 2 is unknown and is estimated from the sample, then Var ( y ) follows a t -distribution with (n − 1) degrees of freedom. When 2 is known, then the 100( 1 − ) % confidence interval is given by y −Y P −Z Z = 1− 2 Var ( y ) 2 or P y − Z Var ( y ) Y y + Z Var ( y ) = 1 − 2 2 and the confidence limits are y − Z Var ( y ), y + Z Var ( y ) 2 2 where Z denotes the upper % points on N (0,1) distribution. Similarly, when 2 is unknown, 2 2 then the 100 (1 − ) % confidence interval is Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page17 17 y −Y P −t t = 1 − 2 2 Var ( y ) or P y − t Var ( y ) Y y + t Var ( y ) = 1 − 2 2 and the confidence limits are y − t Var ( y ), y + t Var ( y ) 2 2 where t denotes the upper % points on t -distribution with (n − 1) degrees of freedom. 2 2 Determination of sample size The sample size is needed before the survey starts and goes into operation. One point to remember is that when the sample size increases, the variance of estimators decreases, then the cost of the survey increases and vice versa. So, there has to be a balance between the two aspects- cost and variability. The sample size can be determined based on prescribed values of the standard error of the sample mean, the error of estimation, the width of the confidence interval, the coefficient of variation of the sample mean, the relative error of the sample mean, or total cost, among several others. An important constraint or need to determine the sample size is that the information regarding the population standard derivation S should be known for these criteria. The reason for this and the need for it will be clear when we derive the sample size in the next section. A question arises about how to have information about S beforehand? The possible solutions to this issue are to conduct a pilot survey and collect a preliminary sample of small size, estimate S and use it as a known value of S it. Alternatively, such information can also be collected from past data, past experience, the long association of the experimenter with the experiment, prior information, etc. The bootstrap method can also be used to obtain the value of S. Now, we find the sample size under different criteria, assuming that the samples have been drawn using SRSWOR. The case for SRSWR can be derived similarly. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page18 18 1. Pre-specified variance The sample size is to be determined such that the variance of y should not exceed a given value, say V. In this case, find n such that Var ( y ) V N −n 2 or S V Nn 1 1 V or − 2 n N S 1 1 1 or − n N ne ne n n 1+ e N S2 where ne =. V It may be noted here that ne can be known only when S 2 is known. This reason compels us to assume that S should be known. The same reasoning will also be seen in other cases. The smallest sample size needed in this case is ne nsmallest =. n 1+ e N If N is large, then the required n is n ne and nsmallest = ne. 2. Pre-specified estimation error It may be possible to have some prior knowledge of population mean Y. It may be required that the sample mean y should not differ from it by more than a specified amount of absolute estimation error, i.e., which is a small quantity. Such a requirement can be satisfied by associating a probability (1 − ) with it and can be expressed as P y − Y e = (1 − ). Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page19 19 N −n 2 Since y follows N Y , S assuming the normal distribution for the population, we can write Nn y −Y e P = 1− Var ( y ) Var ( y ) which implies that e = Z Var ( y ) 2 or Z2 Var ( y ) = e2 2 N −n 2 or Z 2 S = e2 2 Nn Z S 2 2 e or n = Z S 2 1 2 1+ N e which is the required sample size. If N is large then 2 Z S n = 2e . 3. Pre-specified width of the confidence interval If the requirement is that the width of the confidence interval of y with confidence coefficient (1 − ) should not exceed a pre-specified amount W , then the sample size n is determined such that 2Z Var ( y ) W 2 assuming 2 is known, and the population is normally distributed. This can be expressed as N −n 2Z S W 2 Nn Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page20 20 1 1 or 4Z 2 − S 2 W 2 2 n N 1 1 W2 or + n N 4Z 2 S 2 2 4 Z 2 S 2 2 or n W2. 4 Z 2 S 2 1+ 2 NW 2 The minimum sample size required is 4Z 2 S 2 2 nsmallest = W2. 4Z 2 S 2 1+ 2 NW 2 If N is large then 4Z 2 S 2 n 2 W2 and the minimum sample size needed is 4Z 2 S 2 nsmallest = 2. W2 4. Pre-specified coefficient of variation The coefficient of variation (CV) is defined as the ratio of standard error (or standard deviation) and mean. The knowledge of the coefficient of variation has played an important role in the sampling theory as this information has helped in deriving efficient estimators. If it is desired that the coefficient of variation of y should not exceed a given or pre-specified value of the coefficient of variation, say C0 , then the required sample size n is to be determined such that CV ( y ) C0 Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page21 21 Var ( y ) or C0 Y N −n 2 S or Nn 2 C02 Y 1 1 C02 or − n N C2 C2 Co2 or n C2 1+ NC02 S is the required sample size where C = is the population coefficient of variation. Y The smallest sample size needed in this case is C2 C02 nsmallest =. C2 1+ NC02 If N is large, then C2 n 2 C0 C2 and nsmalest =. C02 5. Pre-specified relative error When y is used for estimating the population mean Y , then the relative estimation error is defined y −Y as. If it is required that such relative estimation error should not exceed a pre-specified value Y R with probability (1 − ) , then such requirement can be satisfied by expressing it like such requirement can be satisfied by expressing it like y −Y RY P = 1− . Var ( y ) Var ( y ) Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page22 22 N −n 2 Assuming the population to be normally distributed, y follows N Y , S . Nn So it can be written that RY = Z. Var ( y ) 2 N −n 2 or Z 2 S = R Y 2 2 2 Nn 1 1 R2 or − = 2 2 n N C Z 2 2 Z C 2 R or n = 2 Z C 1 1+ 2 N R S where C = is the population coefficient of variation and should be known. Y If N is large, then 2 z C n= 2 . R 6. Pre-specified cost Let an amount of money C be designated for sample survey to collect n observations, C0 be the overhead cost and C1 be the cost of collection of one unit in the sample. Then the total cost C can be expressed as C = C0 + nC1 C − C0 or n = C1 is the required sample size. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur Page23 23