A3424341-48C5-455A-891F-4DF2F8CA0672.jpeg
Document Details

Uploaded by ConscientiousChrysoprase8504
Full Transcript
# Statistical Inference ## Point Estimation ### Definition A point estimate of some population parameter $\theta$ is a single number $\hat{\theta}$ that is based on sample data and represents a "reasonable" value for $\theta$. ### Definition A point estimator is a statistic $\hat{\Theta}$ whose...
# Statistical Inference ## Point Estimation ### Definition A point estimate of some population parameter $\theta$ is a single number $\hat{\theta}$ that is based on sample data and represents a "reasonable" value for $\theta$. ### Definition A point estimator is a statistic $\hat{\Theta}$ whose value is a point estimate. ## Unbiased Estimators ### Definition A point estimator $\hat{\Theta}$ is said to be an unbiased estimator of $\theta$ if $E(\hat{\Theta}) = \theta$ for every possible value of $\theta$. If $\hat{\Theta}$ is not unbiased, then the difference $E(\hat{\Theta}) - \theta$ is called the bias of $\hat{\Theta}$. ### Example * The sample mean $\bar{X}$ is an unbiased estimator of the population mean $\mu$ because $E(\bar{X}) = \mu$. * The sample variance $S^2 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n-1}$ is an unbiased estimator of the population variance $\sigma^2$ because $E(S^2) = \sigma^2$. ### Remarks * Unbiasedness does not imply that a particular estimate will be very close to $\theta$, only that the estimator will be correct on average in many samples. * It is often possible to adjust a biased estimator to make it unbiased. * For example, $S'^2 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n}$ is a biased estimator of $\sigma^2$, since $E(S'^2) = \frac{n-1}{n}\sigma^2$. However, $S^2 = \frac{n}{n-1}S'^2$ is unbiased. * For many parameters, there are several unbiased estimators. How do we choose between them? ## Minimum Variance Estimators ### Definition If $\hat{\Theta}_1$ and $\hat{\Theta}_2$ are both unbiased estimators of $\theta$, $\hat{\Theta}_1$ is said to be more efficient than $\hat{\Theta}_2$ if $V(\hat{\Theta}_1) < V(\hat{\Theta}_2)$. The relative efficiency of $\hat{\Theta}_1$ to $\hat{\Theta}_2$ is $V(\hat{\Theta}_2) / V(\hat{\Theta}_1)$. ### Definition If $\hat{\Theta}^*$ is an unbiased estimator of $\theta$ and $V(\hat{\Theta}^*) \le V(\hat{\Theta})$ for every unbiased estimator $\hat{\Theta}$ of $\theta$, then $\hat{\Theta}^*$ is called the minimum variance unbiased estimator (MVUE) of $\theta$. ### Example If $X_1,..., X_n$ is a random sample from a population with mean $\mu$ and variance $\sigma^2$, then both $\hat{\Theta}_1 = X_1$ and $\hat{\Theta}_2 = \bar{X}$ are unbiased estimators of $\mu$. However, $V(\hat{\Theta}_1) = \sigma^2 > \sigma^2 / n = V(\hat{\Theta}_2)$, so $\bar{X}$ is a more efficient estimator of $\mu$ than is $X_1$. In fact, $\bar{X}$ is the MVUE of $\mu$. ## Method of Moments ### Method of Moments Let $X_1,..., X_n$ be a random sample from a probability distribution with probability density function (pdf) or probability mass function (pmf) $f(x; \theta_1,..., \theta_k)$, where $\theta_1,..., \theta_k$ are parameters whose values are unknown. The method of moments estimators $\hat{\theta}_1,..., \hat{\theta}_k$ are found by equating the first $k$ sample moments to the corresponding $k$ population moments and solving the resulting system of equations for $\hat{\theta}_1,..., \hat{\theta}_k$. ### Method of Moments: Procedure 1. Compute the first $k$ population moments, which will be functions of the unknown parameters $\theta_1,..., \theta_k$. That is, let $\mu'_j = E(X^j)$ for $j = 1,..., k$. 2. Compute the first $k$ sample moments. That is, let $m'_j = \frac{1}{n} \sum_{i=1}^{n} X_i^j$ for $j = 1,..., k$. 3. Equate the population moments to the sample moments and solve for $\theta_1,..., \theta_k$. That is, solve the system of equations $\mu'_1 = m'_1$ $\mu'_2 = m'_2$... $\mu'_k = m'_k$ for $\theta_1,..., \theta_k$. The solutions $\hat{\theta}_1,..., \hat{\theta}_k$ are the method of moments estimators of $\theta_1,..., \theta_k$. ### Remarks * The method of moments estimators are usually not unbiased. * The method of moments estimators are usually not the MVUEs. * The method of moments estimators are usually consistent; that is, they converge to the true values of the parameters as the sample size increases. * The method of moments estimators can be used as starting values for more sophisticated estimation methods, such as maximum likelihood estimation. ## Maximum Likelihood Estimation ### Definition Let $X_1,..., X_n$ be a random sample from a population with probability density function (pdf) or probability mass function (pmf) $f(x; \theta)$, where $\theta$ is a parameter whose value is unknown. Then the likelihood function is defined as $L(\theta; x_1,..., x_n) = \prod_{i=1}^{n} f(x_i; \theta)$. ### Definition Let $\hat{\theta}$ be the value of $\theta$ that maximizes $L(\theta; x_1,..., x_n)$. Then $\hat{\theta}$ is called the maximum likelihood estimate (MLE) of $\theta$. The maximum likelihood estimator (MLE) of $\theta$ is the estimator $\hat{\Theta}$ that maximizes $L(\theta; x_1,..., x_n)$. ### Remarks * Since $L(\theta; x_1,..., x_n)$ and $\ln L(\theta; x_1,..., x_n)$ are maximized at the same value of $\theta$, it is often easier to maximize the log-likelihood function $\ln L(\theta; x_1,..., x_n)$. * If $\frac{d}{d\theta} \ln L(\theta; x_1,..., x_n) = 0$ has a unique solution, then that solution is the MLE of $\theta$. * If $\frac{d}{d\theta} \ln L(\theta; x_1,..., x_n) = 0$ has multiple solutions, then the MLE of $\theta$ is the solution that maximizes $L(\theta; x_1,..., x_n)$. * If $\frac{d}{d\theta} \ln L(\theta; x_1,..., x_n) = 0$ has no solution, then the MLE of $\theta$ is the value of $\theta$ that maximizes $L(\theta; x_1,..., x_n)$ on the boundary of the parameter space. * The MLEs are usually not unbiased. * The MLEs are usually consistent; that is, they converge to the true values of the parameters as the sample size increases. * The MLEs are usually asymptotically normally distributed; that is, as the sample size increases, the distribution of the MLEs approaches a normal distribution. * The MLEs are usually asymptotically efficient; that is, as the sample size increases, the variance of the MLEs approaches the Cramer-Rao lower bound. * If $\hat{\theta}$ is the MLE of $\theta$, then $g(\hat{\theta})$ is the MLE of $g(\theta)$ for any function $g$. This is called the invariance property of MLEs.