Podcast
Questions and Answers
Why is low variance in an estimator, Var($\theta$), considered desirable?
Why is low variance in an estimator, Var($\theta$), considered desirable?
- It suggests that the estimator is complex and requires more data.
- It implies less uncertainty about the parameter $\theta$. (correct)
- It indicates that the estimator is biased.
- It guarantees that the estimator will always be equal to the true parameter $\theta$.
What is a key consideration when applying a statistical result or theorem that requires independent and identically distributed (iid) data?
What is a key consideration when applying a statistical result or theorem that requires independent and identically distributed (iid) data?
- The population size must be infinite to guarantee iid conditions.
- The sample size must be large to ensure iid conditions.
- The data must be checked to confirm whether the iid requirement is actually met. (correct)
- It is always safe to assume iid conditions in any statistical analysis.
Under what condition can the infinite population case and the finite population case when sampling with replacement (WR) be considered together??
Under what condition can the infinite population case and the finite population case when sampling with replacement (WR) be considered together??
- When the sampling is done without replacement.
- When the variance of the sample is zero.
- When the population size is extremely large.
- In that iid (independent and identically distributed) is possible in both cases. (correct)
Why does sampling without replacement (WOR) in a finite population require special consideration?
Why does sampling without replacement (WOR) in a finite population require special consideration?
Why is focusing on finite populations and sampling without replacement (WOR) important in certain statistical analyses?
Why is focusing on finite populations and sampling without replacement (WOR) important in certain statistical analyses?
What does calculating the variance, Var($\theta$), primarily enable in statistical estimation?
What does calculating the variance, Var($\theta$), primarily enable in statistical estimation?
What distinguishes a statistic from an estimator?
What distinguishes a statistic from an estimator?
In the context of confidence intervals, why does $P(\hat{L} \leq \theta \leq \hat{U}) = 0.95$ make sense, but $P(5 \leq \theta \leq 10) = 0.95$ does not (assuming $ \hat{L} = 5$ and $\hat{U} = 10$ in a particular instance)?
In the context of confidence intervals, why does $P(\hat{L} \leq \theta \leq \hat{U}) = 0.95$ make sense, but $P(5 \leq \theta \leq 10) = 0.95$ does not (assuming $ \hat{L} = 5$ and $\hat{U} = 10$ in a particular instance)?
What condition must be met for random variables $X$ and $Y$ to be considered independent?
What condition must be met for random variables $X$ and $Y$ to be considered independent?
If random variables $X$ and $Y$ are independent, what is the implication for their expected values?
If random variables $X$ and $Y$ are independent, what is the implication for their expected values?
What is the correct interpretation of covariance equaling zero, $Cov(X, Y) = 0$, for two random variables $X$ and $Y$?
What is the correct interpretation of covariance equaling zero, $Cov(X, Y) = 0$, for two random variables $X$ and $Y$?
How does the multilinearity property apply to the covariance of random variables $X_1, ..., X_n$ and $Y_1, ..., Y_{n'}$?
How does the multilinearity property apply to the covariance of random variables $X_1, ..., X_n$ and $Y_1, ..., Y_{n'}$?
If $X \sim N(\mu_X, \sigma_X^2)$ and $Y \sim N(\mu_Y, \sigma_Y^2)$ are independent, what is the distribution of $X + Y$?
If $X \sim N(\mu_X, \sigma_X^2)$ and $Y \sim N(\mu_Y, \sigma_Y^2)$ are independent, what is the distribution of $X + Y$?
What is the significance of denoting parameters with Greek letters in statistics?
What is the significance of denoting parameters with Greek letters in statistics?
In simple random sampling (SRS), what does it mean for every sample of size $n$ to have the same chance of being chosen?
In simple random sampling (SRS), what does it mean for every sample of size $n$ to have the same chance of being chosen?
In the context of statistics, what do hats and bars above variables usually indicate?
In the context of statistics, what do hats and bars above variables usually indicate?
Consider a scenario where $Y_1, Y_2, ...$ are elements picked from a population. Why are these elements identically distributed but not independent when sampling without replacement?
Consider a scenario where $Y_1, Y_2, ...$ are elements picked from a population. Why are these elements identically distributed but not independent when sampling without replacement?
Why is it necessary to quantify the uncertainty in an estimator when performing statistical estimation?
Why is it necessary to quantify the uncertainty in an estimator when performing statistical estimation?
When estimating the variance of an estimator, why is it important to use an unbiased estimate?
When estimating the variance of an estimator, why is it important to use an unbiased estimate?
In the context of simple random sampling without replacement, why is $s^2$ not an unbiased estimator of $\sigma^2$?
In the context of simple random sampling without replacement, why is $s^2$ not an unbiased estimator of $\sigma^2$?
What is the purpose of multiplying the estimated standard error by $z_{\alpha/2}$ when calculating confidence intervals?
What is the purpose of multiplying the estimated standard error by $z_{\alpha/2}$ when calculating confidence intervals?
Under what condition is it acceptable to use the normal distribution instead of the t-distribution when constructing confidence intervals?
Under what condition is it acceptable to use the normal distribution instead of the t-distribution when constructing confidence intervals?
In inference for population totals, if $\hat{\tau} = N\bar{Y}_n$ is used as an estimator, what property does this estimator have?
In inference for population totals, if $\hat{\tau} = N\bar{Y}_n$ is used as an estimator, what property does this estimator have?
What is the first step in determining the appropriate sample size for a survey?
What is the first step in determining the appropriate sample size for a survey?
What approach is recommended when the exact value of the population size, $N$, is unknown and very large?
What approach is recommended when the exact value of the population size, $N$, is unknown and very large?
In surveys, how can population proportions be handled, and what makes them particularly useful?
In surveys, how can population proportions be handled, and what makes them particularly useful?
What is the implication if two random variables $X$ and $Y$ are independent?
What is the implication if two random variables $X$ and $Y$ are independent?
When is it essential to focus on finite populations and sampling without replacement (WOR)?
When is it essential to focus on finite populations and sampling without replacement (WOR)?
Flashcards
Simple Random Sampling (SRS)
Simple Random Sampling (SRS)
A basic sampling method where every sample of size n has an equal chance of being selected.
Estimator
Estimator
A value calculated from sample data used to estimate a population parameter.
Uncertainty of the estimator
Uncertainty of the estimator
A measure of the estimator's variability, often variance or standard error
Sampling Fraction
Sampling Fraction
Signup and view all the flashcards
Confidence Interval
Confidence Interval
Signup and view all the flashcards
Margin of Error
Margin of Error
Signup and view all the flashcards
95% Confidence Interval
95% Confidence Interval
Signup and view all the flashcards
Population Total
Population Total
Signup and view all the flashcards
Inference for Population Totals
Inference for Population Totals
Signup and view all the flashcards
Choosing Sample Size
Choosing Sample Size
Signup and view all the flashcards
population proportion (p)
population proportion (p)
Signup and view all the flashcards
Study Notes
- Sampling Theory Lecture 3 by Ha-Young Shin, Soongsil University Spring 2025
Review of Last Week's Material
- Covered basic statistical concepts related to simple random sampling (SRS), including definition, sample mean, and variance of sample mean.
- Statistics is about making inferences on a population, often by estimating a parameter.
- The goal is to have both a point and interval estimate to quantify the uncertainty of the point estimate of a parameter.
- Requires both an estimator and a measure of the uncertainty of the estimator, usually the variance, Var(), or standard error, √Var().
- The main point of calculating Var() is to do interval estimation and an unbiased estimator of Var() is preferred.
- Low variance Var() is good, implying less uncertainty about θ.
- Pay attention to whether a result/theorem requires independent and identically distributed (iid) data. Like s² is an unbiased estimator of σ² when data are iid, but not necessarily when data aren't iid.
- The infinite population case and the finite population case when sampling with replacement (WR) can be considered together, as iid is possible in both cases.
- This is not possible in finite population sampling without replacement (WOR), which requires special consideration.
- The course focuses on finite populations and sampling WOR.
- WOR sampling gives lower variance.
- Population total τ and population mean μ are of interest, which only makes sense with a finite population (WOR or WR).
- Office is located in 508 Baird Hall.
- Last lecture inaccurately stated that statistic and estimator were the same thing: a statistic is a function of the data/sample, while an estimator is a statistic estimating a parameter.
- All estimators are statistics, but not all statistics are estimators; sample:population = estimator:parameter.
- In the context of confidence intervals, P(Î ≤ θ ≤ Û) = 0.95 makes sense, but P(5 ≤ θ ≤ 10) = 0.95 does not, because Î and Û are random variables while 5 and 10 are constants.
- Random variables are functions. Î = 5 means Î(ω) = 5 for a specific input ω.
- Two random variables X and Y are called independent if P(X ≤ x and Y ≤ y) = P(X ≤ x)P(Y ≤ y) for all x, y ∈ R.
- If X and Y are independent, E(XY) = E(X)E(Y).
- If X and Y are independent, and f and g are functions, then f(X) and g(Y) are independent.
- Covariance of two random variables X and Y with means μχ and μγ is Cov(Y1, Y2) := E[(Y₁ – μχ)(Y2 – μγ)] = E(Y1Y2) – μχμγ.
- Independence implies covariance = 0, but not the other way around; however, if X and Y are normal, they are independent if and only if Cov(X, Y) = 0.
- Covariance satisfies symmetry: Cov(X, Y) = Cov(Y, X).
- Multilinearity of covariance, used in proof of unbiasedness of sample variance: Cov(ΣXi, ΣYj) = ΣΣ Cov(Xi, Yj).
- Cov(Y1 + Y2, Y1 + Y2) = Var(Y1) + Var(Y2) + 2Cov(Y1, Y2).
- If Cov(Y1, Y2) = 0, then Var(Y1 + Y2) = Var(Y1) + Var(Y2).
- If Y ~ N(μY, σY²), then for constants a, b ∈ R, aY + b ~ N(aμ + b, a²σY²).
- If X ~ N(μX, σX²) and Y ~ N(μY, σY²) are independent, then X + Y ~ N(μX + μY, σX² + σY²).
Simple Random Sampling (cont'd)
- Population size is denoted as N, and sample size is n, with elements of the population as u1, ..., uN.
- Population mean, variance, and total are denoted by μ, σ², and τ, respectively.
- The population mean is calculated as: μ = (Σj=1 to N uj) / N.
- Population variance is calculated as: σ² = (Σj=1 to N (uj - μ)²) / N.
- The population total is calculated as: τ = Σj=1 to N uj = Nμ.
- Greek letters typically denote parameters in statistics.
- The sample is Y1, ..., Yn, sampled without replacement (WOR).
- Simple Random Sampling (SRS): Every sample of size n has the same chance of being chosen.
- The probability of any given sample being chosen is based on the sample size.
- Estimate the mean in SRS by calculating μ̂:= Yn = (1/n) * Σi=1 to n Yi .
- Hats and bars usually indicate statistics.
- Note that Y₁, Y₂, ...are identically distributed but not independent.
- Yn is unbiased.
- Need to quantify the uncertainty in the estimator (standard error/interval estimation).
- Variance is Var(Yn) = ((N - n) / (N - 1)) * (σ² / n).
- σ² is unknown so Var(Yn) needs to be estimated.
- In an iid case, s² is an unbiased estimator of σ².
- IID is impossible in a finite population case when sampling without replacement.
- s² is not an unbiased estimator of σ² in this case
- Let's prove that E(s²) = (N / N - 1)σ².
- (N - 1)s²/N is an unbiased estimate of σ² and Var(Yn) = (1 - n/N) * (s²/n) is an unbiased estimator of the variance Var(Yn).
- Compare with sampling with replacement (WR): Var(Yn) = s²/n.
- The quantity n/N is the sampling fraction, denoted as f := n/N.
- Used in calculating confidence intervals/margins of error, assume that: sample means are approximately normally distributed and multiply estimated standard error by za/2 to get the estimated margin of error for 100(1 - α)% confidence intervals.
- Note, that since Var(Yi) = σ² is usually unknown, use the t-distribution and not the normal distribution; with large enough n, the difference can be ignored.
- The estimated margin of error for α = 0.05 is B = 1.96 * √Var(Yn) = 1.96 * √(1 - f) (s²/n).
- The 95% confidence interval is [Yn - 1.96√(1 - f) (s²/n), Yn + 1.96√(1 - f) (s²/n)].
- Example: To know mean hours spent per day on smartphones by 2534 students, sample 200 to get sample mean of 4.4 hours and variance of 1.2.
- Given N = 2534, n = 200, Yn = 4.4, and s² = 1.2.
- Var(Yn) = (1 - n/N) (s²/n) = (1 - 200/2534) (1.2/200) = 0.005526
- B = 1.96 * √Var(Yn) = 0.1457, and the 95% confidence interval is [4.4 - 0.1457, 4.4 + 0.1457] = [4.25, 4.55].
- Therefore, τ = Nμ.
- Thus inference is easy if to know how to do inference for μ
- The estimator is μ̂:= NYn = N * Σi=1 to n Yi/n.
- Its expected value is E(μ̂) = NE(Yn) = Nμ = τ, so it is unbiased.
- Its variance is Var(μ̂) = N²Var(Yn) = N²((N - n)/(N - 1)) (σ²/n).
- The estimate for the variance is Var(μ̂) = N²(1 - f) (s²/n).
- The estimate for the margin of error at α = 0.05 is 1.96N * √(1 - f) (s²/n).
- The 95% confidence interval is [NYn - 1.96N√(1 - f) (s²/n), NYn + 1.96N√(1 - f) (s²/n)].
- Point estimate is τ = NYn = 2534 * 4.4 = 11149.6.
- So Var(τ̂) = N²(1 - n/N) (s²/n) = 2534²(1 - 200/2534) *(1.2/200) = 35486.136.
- B = 1.96 * √Var(τ̂) = 369.22 and the confidence interval is [11149.6 - 369.2, 11149.6 + 369.2] = [10780.4, 11518.8].
- Choose a sample size before the survey.
- Want surveys as small as possible for time/cost efficiency, that still achieve a certain level of precision in its estimate.
- This is done while keeping the margin of error under a certain level.
- B and α are usually determined before designing the survey.
- So if estimator θn(dependent on n) follows a normal distribution with the margin of error to B for some α, solve for n.
- za/2 * √Var(θn) = B
- Given Yn (finite population, sampling WOR) with α = 0.05, then:
- za/2 * √(N - n)σ² / (N - 1)n = B
- n = Nσ² / ((N - 1)D + σ²), where D = B²/1.96².
- If σ² is unknown, use prior data/estimates of variance if available.
- Since (N - 1)s²/N is an unbiased estimator of σ², use n = Ns² / (ND + s²), where D = B²/1.96².
- Example: To achieve a margin of error of 0.1 for the mean with the smartphone example:
- N = 2534, s² = 1.2, then the required sample size is roughly n = (2534 * 1.2) / (2534 * 0.1²/1.96² + 1.2) = 390 students in the sample, or 190 more than are already sampled.
When N is Unknown
- N is unknown when N is very large like a population of an entire country.
- Do inference for the mean, dealing with the population size as infinite.
- Do inference as if the population were infinite/sampling was done with replacement/the data are iid.
- Inference for population proportions is when, the parameter that we want is the proportion p of the population in some category such as election polling
- The quantity that we measure from each element surveyed is binary: 1 if element belongs to the category and 0 if not.
- Thus, if the population elements have measurements u1, ..., uN, each uj = 0 or 1.
- p = (1/N) * Σj=1 to N uj.
- p is a special case of a population mean, so inference for p can be done in a similar way to estimating a mean.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.