Sampling Theory: SRS Review

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Why is low variance in an estimator, Var($\theta$), considered desirable?

  • It suggests that the estimator is complex and requires more data.
  • It implies less uncertainty about the parameter $\theta$. (correct)
  • It indicates that the estimator is biased.
  • It guarantees that the estimator will always be equal to the true parameter $\theta$.

What is a key consideration when applying a statistical result or theorem that requires independent and identically distributed (iid) data?

  • The population size must be infinite to guarantee iid conditions.
  • The sample size must be large to ensure iid conditions.
  • The data must be checked to confirm whether the iid requirement is actually met. (correct)
  • It is always safe to assume iid conditions in any statistical analysis.

Under what condition can the infinite population case and the finite population case when sampling with replacement (WR) be considered together??

  • When the sampling is done without replacement.
  • When the variance of the sample is zero.
  • When the population size is extremely large.
  • In that iid (independent and identically distributed) is possible in both cases. (correct)

Why does sampling without replacement (WOR) in a finite population require special consideration?

<p>Because iid (independent and identically distributed) is not possible. (D)</p> Signup and view all the answers

Why is focusing on finite populations and sampling without replacement (WOR) important in certain statistical analyses?

<p>It is more realistic for many practical scenarios. (B)</p> Signup and view all the answers

What does calculating the variance, Var($\theta$), primarily enable in statistical estimation?

<p>To conduct interval estimation. (A)</p> Signup and view all the answers

What distinguishes a statistic from an estimator?

<p>A statistic is any function of the data/sample, while an estimator is a statistic that estimates a parameter. (D)</p> Signup and view all the answers

In the context of confidence intervals, why does $P(\hat{L} \leq \theta \leq \hat{U}) = 0.95$ make sense, but $P(5 \leq \theta \leq 10) = 0.95$ does not (assuming $ \hat{L} = 5$ and $\hat{U} = 10$ in a particular instance)?

<p>Because $\hat{L}$ and $\hat{U}$ are random variables, while 5 and 10 are constants. (B)</p> Signup and view all the answers

What condition must be met for random variables $X$ and $Y$ to be considered independent?

<p>$P(X \le x \text{ and } Y \le y) = P(X \le x) \cdot P(Y \le y)$ for all $x, y \in \mathbb{R}$ (C)</p> Signup and view all the answers

If random variables $X$ and $Y$ are independent, what is the implication for their expected values?

<p>$E(XY) = E(X) \cdot E(Y)$ (B)</p> Signup and view all the answers

What is the correct interpretation of covariance equaling zero, $Cov(X, Y) = 0$, for two random variables $X$ and $Y$?

<p>X and Y are uncorrelated; independence can only be inferred under specific conditions like normality. (D)</p> Signup and view all the answers

How does the multilinearity property apply to the covariance of random variables $X_1, ..., X_n$ and $Y_1, ..., Y_{n'}$?

<p>$Cov(\sum_{i=1}^{n} X_i, \sum_{j=1}^{n'} Y_j) = \sum_{i=1}^{n} \sum_{j=1}^{n'} Cov(X_i, Y_j)$ (D)</p> Signup and view all the answers

If $X \sim N(\mu_X, \sigma_X^2)$ and $Y \sim N(\mu_Y, \sigma_Y^2)$ are independent, what is the distribution of $X + Y$?

<p>$N(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2)$ (A)</p> Signup and view all the answers

What is the significance of denoting parameters with Greek letters in statistics?

<p>Greek letters usually denote parameters in statistics. (A)</p> Signup and view all the answers

In simple random sampling (SRS), what does it mean for every sample of size $n$ to have the same chance of being chosen?

<p>Every possible combination of $n$ elements from the population has an equal probability of being the selected sample. (D)</p> Signup and view all the answers

In the context of statistics, what do hats and bars above variables usually indicate?

<p>Statistics. (C)</p> Signup and view all the answers

Consider a scenario where $Y_1, Y_2, ...$ are elements picked from a population. Why are these elements identically distributed but not independent when sampling without replacement?

<p>Because the outcome of picking one element affects the probabilities of subsequent picks. (D)</p> Signup and view all the answers

Why is it necessary to quantify the uncertainty in an estimator when performing statistical estimation?

<p>To enable standard error and interval estimation. (A)</p> Signup and view all the answers

When estimating the variance of an estimator, why is it important to use an unbiased estimate?

<p>To avoid systematic over- or under-estimation of the true variance. (B)</p> Signup and view all the answers

In the context of simple random sampling without replacement, why is $s^2$ not an unbiased estimator of $\sigma^2$?

<p>Because observations are not independent. (A)</p> Signup and view all the answers

What is the purpose of multiplying the estimated standard error by $z_{\alpha/2}$ when calculating confidence intervals?

<p>To obtain the estimated margin of error for a $100(1 - \alpha)%$ confidence interval. (A)</p> Signup and view all the answers

Under what condition is it acceptable to use the normal distribution instead of the t-distribution when constructing confidence intervals?

<p>When the sample size is large enough. (A)</p> Signup and view all the answers

In inference for population totals, if $\hat{\tau} = N\bar{Y}_n$ is used as an estimator, what property does this estimator have?

<p>It is an unbiased estimator. (C)</p> Signup and view all the answers

What is the first step in determining the appropriate sample size for a survey?

<p>Decide on the desired level of precision and acceptable margin of error. (B)</p> Signup and view all the answers

What approach is recommended when the exact value of the population size, $N$, is unknown and very large?

<p>Treat the population size as infinite when doing inference for the mean. (A)</p> Signup and view all the answers

In surveys, how can population proportions be handled, and what makes them particularly useful?

<p>They are treated as a special case of a population mean and allow for simplified inference. (B)</p> Signup and view all the answers

What is the implication if two random variables $X$ and $Y$ are independent?

<p>Knowing the value of $X$ does not give information about the value of $Y$. (C)</p> Signup and view all the answers

When is it essential to focus on finite populations and sampling without replacement (WOR)?

<p>When it is more representative for real-world scenarios. (C)</p> Signup and view all the answers

Flashcards

Simple Random Sampling (SRS)

A basic sampling method where every sample of size n has an equal chance of being selected.

Estimator

A value calculated from sample data used to estimate a population parameter.

Uncertainty of the estimator

A measure of the estimator's variability, often variance or standard error

Sampling Fraction

Denoted as f, it's the ratio of sample size (n) to population size (N).

Signup and view all the flashcards

Confidence Interval

An interval that estimates a population parameter with a certain level of confidence.

Signup and view all the flashcards

Margin of Error

The extent to which the sample mean is likely to differ from the population mean.

Signup and view all the flashcards

95% Confidence Interval

The estimated range around a sample mean within which the true population mean is expected to lie.

Signup and view all the flashcards

Population Total

The total for the entire population

Signup and view all the flashcards

Inference for Population Totals

Estimating population total using sample data

Signup and view all the flashcards

Choosing Sample Size

Determining the right sample size for desired precision.

Signup and view all the flashcards

population proportion (p)

The proportion of the population that belongs to a specific category

Signup and view all the flashcards

Study Notes

  • Sampling Theory Lecture 3 by Ha-Young Shin, Soongsil University Spring 2025

Review of Last Week's Material

  • Covered basic statistical concepts related to simple random sampling (SRS), including definition, sample mean, and variance of sample mean.
  • Statistics is about making inferences on a population, often by estimating a parameter.
  • The goal is to have both a point and interval estimate to quantify the uncertainty of the point estimate of a parameter.
  • Requires both an estimator and a measure of the uncertainty of the estimator, usually the variance, Var(), or standard error, √Var().
  • The main point of calculating Var() is to do interval estimation and an unbiased estimator of Var() is preferred.
  • Low variance Var() is good, implying less uncertainty about θ.
  • Pay attention to whether a result/theorem requires independent and identically distributed (iid) data. Like s² is an unbiased estimator of σ² when data are iid, but not necessarily when data aren't iid.
  • The infinite population case and the finite population case when sampling with replacement (WR) can be considered together, as iid is possible in both cases.
  • This is not possible in finite population sampling without replacement (WOR), which requires special consideration.
  • The course focuses on finite populations and sampling WOR.
  • WOR sampling gives lower variance.
  • Population total τ and population mean μ are of interest, which only makes sense with a finite population (WOR or WR).
  • Office is located in 508 Baird Hall.
  • Last lecture inaccurately stated that statistic and estimator were the same thing: a statistic is a function of the data/sample, while an estimator is a statistic estimating a parameter.
  • All estimators are statistics, but not all statistics are estimators; sample:population = estimator:parameter.
  • In the context of confidence intervals, P(Î ≤ θ ≤ Û) = 0.95 makes sense, but P(5 ≤ θ ≤ 10) = 0.95 does not, because Î and Û are random variables while 5 and 10 are constants.
  • Random variables are functions. Î = 5 means Î(ω) = 5 for a specific input ω.
  • Two random variables X and Y are called independent if P(X ≤ x and Y ≤ y) = P(X ≤ x)P(Y ≤ y) for all x, y ∈ R.
  • If X and Y are independent, E(XY) = E(X)E(Y).
  • If X and Y are independent, and f and g are functions, then f(X) and g(Y) are independent.
  • Covariance of two random variables X and Y with means μχ and μγ is Cov(Y1, Y2) := E[(Y₁ – μχ)(Y2 – μγ)] = E(Y1Y2) – μχμγ.
  • Independence implies covariance = 0, but not the other way around; however, if X and Y are normal, they are independent if and only if Cov(X, Y) = 0.
  • Covariance satisfies symmetry: Cov(X, Y) = Cov(Y, X).
  • Multilinearity of covariance, used in proof of unbiasedness of sample variance: Cov(ΣXi, ΣYj) = ΣΣ Cov(Xi, Yj).
  • Cov(Y1 + Y2, Y1 + Y2) = Var(Y1) + Var(Y2) + 2Cov(Y1, Y2).
  • If Cov(Y1, Y2) = 0, then Var(Y1 + Y2) = Var(Y1) + Var(Y2).
  • If Y ~ N(μY, σY²), then for constants a, b ∈ R, aY + b ~ N(aμ + b, a²σY²).
  • If X ~ N(μX, σX²) and Y ~ N(μY, σY²) are independent, then X + Y ~ N(μX + μY, σX² + σY²).

Simple Random Sampling (cont'd)

  • Population size is denoted as N, and sample size is n, with elements of the population as u1, ..., uN.
  • Population mean, variance, and total are denoted by μ, σ², and τ, respectively.
  • The population mean is calculated as: μ = (Σj=1 to N uj) / N.
  • Population variance is calculated as: σ² = (Σj=1 to N (uj - μ)²) / N.
  • The population total is calculated as: τ = Σj=1 to N uj = Nμ.
  • Greek letters typically denote parameters in statistics.
  • The sample is Y1, ..., Yn, sampled without replacement (WOR).
  • Simple Random Sampling (SRS): Every sample of size n has the same chance of being chosen.
  • The probability of any given sample being chosen is based on the sample size.
  • Estimate the mean in SRS by calculating μ̂:= Yn = (1/n) * Σi=1 to n Yi .
  • Hats and bars usually indicate statistics.
  • Note that Y₁, Y₂, ...are identically distributed but not independent.
  • Yn is unbiased.
  • Need to quantify the uncertainty in the estimator (standard error/interval estimation).
  • Variance is Var(Yn) = ((N - n) / (N - 1)) * (σ² / n).
  • σ² is unknown so Var(Yn) needs to be estimated.
  • In an iid case, s² is an unbiased estimator of σ².
  • IID is impossible in a finite population case when sampling without replacement.
  • s² is not an unbiased estimator of σ² in this case
  • Let's prove that E(s²) = (N / N - 1)σ².
  • (N - 1)s²/N is an unbiased estimate of σ² and Var(Yn) = (1 - n/N) * (s²/n) is an unbiased estimator of the variance Var(Yn).
  • Compare with sampling with replacement (WR): Var(Yn) = s²/n.
  • The quantity n/N is the sampling fraction, denoted as f := n/N.
  • Used in calculating confidence intervals/margins of error, assume that: sample means are approximately normally distributed and multiply estimated standard error by za/2 to get the estimated margin of error for 100(1 - α)% confidence intervals.
  • Note, that since Var(Yi) = σ² is usually unknown, use the t-distribution and not the normal distribution; with large enough n, the difference can be ignored.
  • The estimated margin of error for α = 0.05 is B = 1.96 * √Var(Yn) = 1.96 * √(1 - f) (s²/n).
  • The 95% confidence interval is [Yn - 1.96√(1 - f) (s²/n), Yn + 1.96√(1 - f) (s²/n)].
  • Example: To know mean hours spent per day on smartphones by 2534 students, sample 200 to get sample mean of 4.4 hours and variance of 1.2.
  • Given N = 2534, n = 200, Yn = 4.4, and s² = 1.2.
  • Var(Yn) = (1 - n/N) (s²/n) = (1 - 200/2534) (1.2/200) = 0.005526
  • B = 1.96 * √Var(Yn) = 0.1457, and the 95% confidence interval is [4.4 - 0.1457, 4.4 + 0.1457] = [4.25, 4.55].
  • Therefore, τ = Nμ.
  • Thus inference is easy if to know how to do inference for μ
  • The estimator is μ̂:= NYn = N * Σi=1 to n Yi/n.
  • Its expected value is E(μ̂) = NE(Yn) = Nμ = τ, so it is unbiased.
  • Its variance is Var(μ̂) = N²Var(Yn) = N²((N - n)/(N - 1)) (σ²/n).
  • The estimate for the variance is Var(μ̂) = N²(1 - f) (s²/n).
  • The estimate for the margin of error at α = 0.05 is 1.96N * √(1 - f) (s²/n).
  • The 95% confidence interval is [NYn - 1.96N√(1 - f) (s²/n), NYn + 1.96N√(1 - f) (s²/n)].
  • Point estimate is τ = NYn = 2534 * 4.4 = 11149.6.
  • So Var(τ̂) = N²(1 - n/N) (s²/n) = 2534²(1 - 200/2534) *(1.2/200) = 35486.136.
  • B = 1.96 * √Var(τ̂) = 369.22 and the confidence interval is [11149.6 - 369.2, 11149.6 + 369.2] = [10780.4, 11518.8].
  • Choose a sample size before the survey.
  • Want surveys as small as possible for time/cost efficiency, that still achieve a certain level of precision in its estimate.
  • This is done while keeping the margin of error under a certain level.
  • B and α are usually determined before designing the survey.
  • So if estimator θn(dependent on n) follows a normal distribution with the margin of error to B for some α, solve for n.
  • za/2 * √Var(θn) = B
  • Given Yn (finite population, sampling WOR) with α = 0.05, then:
  • za/2 * √(N - n)σ² / (N - 1)n = B
  • n = Nσ² / ((N - 1)D + σ²), where D = B²/1.96².
  • If σ² is unknown, use prior data/estimates of variance if available.
  • Since (N - 1)s²/N is an unbiased estimator of σ², use n = Ns² / (ND + s²), where D = B²/1.96².
  • Example: To achieve a margin of error of 0.1 for the mean with the smartphone example:
  • N = 2534, s² = 1.2, then the required sample size is roughly n = (2534 * 1.2) / (2534 * 0.1²/1.96² + 1.2) = 390 students in the sample, or 190 more than are already sampled.

When N is Unknown

  • N is unknown when N is very large like a population of an entire country.
  • Do inference for the mean, dealing with the population size as infinite.
  • Do inference as if the population were infinite/sampling was done with replacement/the data are iid.
  • Inference for population proportions is when, the parameter that we want is the proportion p of the population in some category such as election polling
  • The quantity that we measure from each element surveyed is binary: 1 if element belongs to the category and 0 if not.
  • Thus, if the population elements have measurements u1, ..., uN, each uj = 0 or 1.
  • p = (1/N) * Σj=1 to N uj.
  • p is a special case of a population mean, so inference for p can be done in a similar way to estimating a mean.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser