Ratio and Regression Estimation PDF

Summary

This document provides a detailed explanation of ratio and regression estimation techniques. It discusses the application of these methods in survey sampling and the use of auxiliary variables for improving the precision of estimators. The document outlines different scenarios, including the usage of an SRSWOR sample in estimating the population total.

Full Transcript

STAC53 Ratio and Regression Estimation References: Sampling Design and Analysis, S.L. Lohr (Chap 4) 1 Ratio and Regression Estimation Sometimes in survey sampling, information on one (or more) covariate x that contain...

STAC53 Ratio and Regression Estimation References: Sampling Design and Analysis, S.L. Lohr (Chap 4) 1 Ratio and Regression Estimation Sometimes in survey sampling, information on one (or more) covariate x that contains useful information about the variable of interest y is available prior to sampling. For example, x = acreage of agricultural fields and y = bushels of grain in yield These covariates are often called auxiliary variables or subsidiary variables. Ratio and regression estimation use auxiliary variables that are correlated with the variable of interest to improve the precision of estimators of the mean and total of a population. 2 Ratio estimator under SRSWOR 3 Ratio estimation under SRSWOR Since we need these values for the population, ratio estimation is commonly used when the auxiliary variable is a variable which is easily measured on the whole population while the response variable is harder to measure and is obtained from only an SRSWOR of the population. 4 Example: An investigator wanted to estimate the total sugar content of a truckload of oranges, a SRSWOR of ten oranges was selected. The weights and the sugar contents of those ten oranges are given below: The total weight of all the oranges, obtained by first weighing the truck loaded and then unloaded, was found to be 1500 pounds. Estimate the total sugar content of the all oranges in this truckload. 5 Solution: Let’s recall the formulas and the summary statistics: Note that the number of oranges in the population (i.e. N) is not given here, but we can use the appropriate formula to estimate 𝑡𝑦 0.02280 ෠ 𝐵= = 0.0541567696 0.4210 𝑡Ƹ𝑦𝑟 = 𝐵𝑡෠ 𝑥 = 0.0541567696 × 1500 = 81.24 pounds 6 Ratio estimation under SRSWOR Note: where 𝜌 is the population correlation between 𝑥 and 𝑦. ෠ is biased but the bias This implies that the ratio estimator(𝐵) becomes smaller as the sample size n increases. 7 Ratio estimation under SRSWOR Now we know how to estimate 𝐵, 𝑦ത𝑈 and 𝑡𝑦 using ratio estimators. What are the variances of those estimators? ෠ 𝑉(𝑦ത෠𝑟 ) and 𝑉(𝑡Ƹ𝑦𝑟 )? i.e. 𝑉(𝐵), 8 Ratio estimation under SRSWOR Result: Under simple random sampling without replacement (for large samples size(n)), 𝑦ത෠𝑟 is approximately unbiased for 𝑦ത𝑈. The variances of 𝐵,෠ 𝑦ത෠𝑟 and 𝑡Ƹ𝑦𝑟 are given by (approximately) 9 Result: Estimates of these variances are given by This can also be expressed as 𝑛 𝑠𝑥 𝑠𝑦 where 𝑓 = , 𝐶𝑉 𝑥 = , 𝐶𝑉 𝑦 = and 𝜌ො is the correlation 𝑁 𝑥ҧ 𝑦ത between x and y 10 Result: Estimates of these variances are given by For sufficiently large samples, approximate confidence intervals can be constructed using the standard errors (i.e. the square roots of these ෠ ± 𝑧𝑆𝐸(𝐵) variances) as 𝐵 ෠ , 𝑦ത෠ ± 𝑧𝑆𝐸(𝑦) ത෠ and 𝑡Ƹ𝑦𝑟 ± 𝑧𝑆𝐸(𝑡Ƹ𝑦𝑟 ) 11 When is ratio estimator 𝑦ത෠ better than the simple sample mean 𝑦? ത 12 When is ration estimator 𝑦ത෠ better than the simple sample mean 𝑦? ത 13 Example The data file apisrs contains a SRSWOR of 200 schools from the API population. We will use this data set to estimate the population total api.stu (the number of students who took the API test) using enroll (the number of students enrolled in each school) as an auxiliary variable. Ratio estimation of the population total of y requires the population total of the auxiliary variable x. In the API data set (apipop), some schools have missing values for the variable enroll. We will remove these schools and consider the remaining schools as our population. 14 #R code for ratio estimation # Note this data set has missing values # This code removes the missing values and treats the # individuals with no missing values as the population. apipop

Use Quizgecko on...
Browser
Browser