STAT 144: Introductory Statistical Theory I - Chapter 5 - Introduction to Statistical Inference PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document details statistical inference and the concepts of random sampling. It covers different definitions and examples related to these concepts. The document also introduces parametric and non-parametric families.
Full Transcript
STAT 144: Introductory Statistical Theory I Chapter 5 CHAPTER 5 1st Semester A.Y. 2024-2025 Introduction to Statistical Inference Introduction to Statistical Inference...
STAT 144: Introductory Statistical Theory I Chapter 5 CHAPTER 5 1st Semester A.Y. 2024-2025 Introduction to Statistical Inference Introduction to Statistical Inference 2. Sample statistics are simply numerical characteristics of the sample just as parameters are numerical characteristics of the population. However, sample statistics are Topic 1: Random Sampling from a Probability Distribution random variables and vary from sample to sample, whereas parameters are fixed constants. DEFINITION 1 Illustration: Let X1, , Xn be a random sample from a distribution function F. Then the sample Two random variables X and Y are said to be identically distributed if they have the same distribution function. mean and sample variance, defined as n X n n (X − X) 2 X 2 i − nX 2 X = i and S2 = i = i =1 , Remark: If X and Y are identically distributed it does not follow that P X = Y = 1. i =1 n i =1 n −1 n −1 respectively, are sample statistics. DEFINITION 2 Let X be a random variable with distribution function F and let be independent Topic 2: Parametric and Nonparametric Families and identically distributed (iid) random variables with common distribution F. Then the collection is known as a random sample of size n from the distribution F. DEFINITION 4 A parametric family of density functions is a collection of density functions that is indexed If X1, , Xn is a random sample from F, their joint distribution function is given by by a quantity called a parameter. The parameter, , specifies the form of the distribution n function. The set of all possible values of the parameter is called a parameter set, ( ) g ( x1,..., xn ) = f x j denoted by . j =1 n if F has density function f; and by P X1 = x1,..., Xn = xn = P X j = x j when X1, , Xn are of the j =1 Illustration: Let f ( x; ) = e−x I( 0, ) ( x ) , where 0; then for each 0, f ( ; ) is a probability discrete type with common probability function P X = x for all j. density function, and is a parameter with parameter space ( 0, ). The collection = f ( ; ) : 0 is a parametric family of density functions. Remarks: 1. Sometimes the term population is used to describe the universe from which the sample is drawn; the population may be conceptual. Often F is referred to as probability DEFINITION 4 distribution function. 2. In sampling from a probability distribution, randomness is inherent in the phenomenon A family of distribution functions which is not a parametric family is called a under study. The sample is obtained by independent replications. In sampling from a nonparametric family, that is, the family of underlying distributions for X cannot be finite population, randomness is a consequence of the sampling design. completely specified nor be indexed by a finite number of numerical parameters. 3. In sampling from a finite population, the term population is meaningful in that it refers to some measurable characteristics or observable characteristics of a group of An alternative name for “non-parametric” is “distribution-free”. It should be noted, however, individuals or units. that “non-parametric” does not mean that there is no parameter and “distribution-free” does not mean that no distribution is involved. DEFINITION 3 Illustration: Let be the family of all distribution functions on the real line that have finite Let be a random sample of size n from a population and let be a mean. real-valued or vector-valued function whose domain includes the sample space of Then the random variable or random vector is called a statistic. The probability distribution of a statistic Y is called the sampling distribution of Y. Topic 3: Point and Interval Estimation In this section, we will be discussing the two important problems in statistical inference: Remarks: estimation which is concerned with finding a value or a range of values for a parameter of 1. A statistic is an alternative name given to a random variable or random vector when a distribution, F, based on sample data and test of hypothesis which deals with determining we have a sample of observations. In practice, X1, , Xn could be a random sample, if the sample data supports the underlying model assumptions or not. i.e., X1, , Xn could be independent and identically distributed random variables. Institute of Statistics, CAS, UPLB Institute of Statistics, CAS, UPLB Chapter 5 Chapter 5 Introduction to Statistical Inference Introduction to Statistical Inference DEFINITION 5 DEFINITION 7 A point estimator is any function of a sample; that is, any statistic is a point The mean squared error (MSE) of an estimator W of a parameter is the function of estimator. defined by Remarks: 1. The statistic W must be observable (that is, computable from the sample). Hence, it Computationally, it is defined as MSE (W ) = V (W ) + Bias2 (W ). Smaller mean squared error cannot be a function of any unknown parameter. implies greater precision. We generally choose an estimator with a smaller or smallest mean 2. An estimator is the statistic used to estimate the parameter, and a numerical value of square error. the estimator is called an estimate. For convenience in notation, no distinction will be made. EXAMPLE 1 3. Estimates that are constants (not based on the observations) should not be admissible under any reasonable criteria. Estimate in the following probability density function using a single observation. 4. Even though all statistics that take values in are possible candidates for estimates of , we have to develop criteria to indicate which estimates are “good” and which may be rejected. 5. Let be the unknown parameter to be estimated based on the sample X1, , Xn ˆ ( X1,..., Xn ) = of size n. Estimators will be denoted by ˆ (with or without subscripts). We Evaluate the properties of such estimate. use Greek letter with a “hat” to represent estimators of parameters which are represented by the Greek letter without “hat”. Let W = X. Now, 2 ( − x) 2 2 3 3 2 33 − 23 xdx − x 2 dx = 2 − = 2 E W = x dx = 2 0 = ; and DEFINITION 6 0 2 0 2 3 6 3 The bias of a point estimator W of a parameter is the difference between the expected 2 Bias W = E W − = − = − . value of W and ; that is, An estimator whose bias is identically (in 3 3 ) equal to 0 is called unbiased and satisfies for all . Thus, W is a biased estimator. When we subscript E by , it means that the expected value is to be computed under the Computing for the V W , density or probability function when is the true value of the parameter. 2 ( − x) 2 2 2 4 4 2 4 4 − 3 4 2 x dx − x 3 dx = 2 − = 2 E W 2 = x2 dx = 2 0 = ; and hence 0 2 0 3 4 12 6 2 2 2 Remarks: V W = E W 2 − E2 W = − =. 6 3 18 1. Bias is a systematic error (in some direction). Unbiasedness of W says that W is correct on the average, i.e., the mean of T is . 2 2. If ̂ is unbiased for and is a real valued function on , then ( ̂ ) is not unbiased for 2 2 2 Computing for MSE, we will have MSE W = V W + Bias2 W = + − =. (), in general. 18 3 2 3. To find an unbiased estimate for a parameter, one begins with the computation of the first few moment(s) to see if the parameter is linearly related to any moment(s). If so, then an unbiased estimate is easily obtained. 4. The closeness of the estimator to the parameter is referred to as the accuracy and is measured by bias while closeness of the estimates from different samples to each other is referred to as precision and is measured by variance. 5. If the estimator W is unbiased for , then a good measure of the precision is V W = E W 2 − E2 W . Otherwise, a good measure is the mean squared error or MSE. Institute of Statistics, CAS, UPLB Institute of Statistics, CAS, UPLB Chapter 5 Chapter 5 Introduction to Statistical Inference Introduction to Statistical Inference DEFINITION 8 P 0 x a = 2 P b x = 2 a 1 1 Let be a random sample from a distribution function and let be 0 dx = 2 b dx = 2 a numerical parameter. Let and be two statistics such that for 1 a 1 x = 2 x = 2 all Then the random interval 0 b a −b is known as a minimum level confidence interval for . = 2 = 2 The quantity is called the confidence level. a= b = 1− 2 2 Remarks: Hence, 1. If PF U ( X1, , Xn ) = 1− , we call U a minimum level 1- upper confidence bound, P a x b = 1− and if PF L ( X1, , Xn ) = 1− , we call L a minimum level 1- lower confidence P x 1 − = 1− bound. 2 2 2. In many problems an interval, [L, U] is preferable to a point estimate for due to the x P 1 − = 1− confidence level attached to an interval estimate. 2 2 3. The length of a confidence interval is taken to be a measure of its precision: the narrower the length for a given confidence coefficient, the more precise is the x x estimator. The length is defined as the difference between the upper and the lower P = 1− limits, that is, length = U − L. 1 − 2 2 4. We choose, if possible, a confidence interval that has the least length among all minimum level (1- ) confidence intervals for . However, this is usually difficult to determine. Instead, we concentrate on all confidence intervals based on some x x So that , is a 1− level of confidence interval estimate for . statistic T that have minimum level and choose one which has the least length. Such 1− 2 2 an interval, if it exists, is called tight. 5. If the distribution of T is symmetric about , the length of the confidence interval based on T is minimized by choosing an equal tails confidence interval. That is, we choose Topic 4: Test of Hypothesis in P T − = 1− by taking P T − = 2 and P T − − = 2. 6. We will often choose equal tails confidence intervals for convenience even though Illustration: A coin is tossed six times. It is of interest to know the probability of obtaining a the distribution of T may not be symmetric. “head” in a single toss. Such parameter is denoted as P where 0 < P < 1. EXAMPLE 2 H0: P = ½ against H1: P = ¾ Suppose X has a density function The random variable of interest is T defined as the total number of “heads” in 6 tosses of a coin. The probability mass function of T is given by 6 P T = t = Pt (1− P ) , t = 0,1, ,6. 6−t t Find a (1−) level confidence interval for θ based on a single observation. The probability distribution of T is given by T=t H0: P = ½ H1: P = ¾ Note that P a x b = 1− . Now, 0 1/64 = 0.0156 1/4096 = 0.0002 1 6/64 = 0.0938 18/4096 = 0.0044 2 15/64 = 0.2344 135/4096 = 0.0330 3 20/64 = 0.3125 540/4096 = 0.1318 Institute of Statistics, CAS, UPLB Institute of Statistics, CAS, UPLB Chapter 5 Chapter 5 Introduction to Statistical Inference Introduction to Statistical Inference 4 15/64 = 0.2344 1215/4096 = 0.2966 Remarks: 1. The chosen size is often unattainable particularly when the distribution is discrete; in 5 6/64 = 0.0938 1458/4096 = 0.3560 which case, we usually take the largest level less than that is attainable. 2. If P(C) for all 0, we say that the “critical region is of significance level .” 6 1/64 = 0.0156 729/4096 = 0.1780 3. If sup P ( C ) = then the level and size of C are both equal to . On the other hand, if 0 DEFINITION 9 sup P ( C) , then the size of C is smaller than its significance level. θ 0 A statistical hypothesis is an assertion about the joint distribution of The 4. If H0 is a simple hypothesis, PH0(C) is the size of the critical region C, which may or may not equal to a given significance level . statement being tested that is called the null hypothesis, and we 5. The choice of a specific value for (0.1, 0.5, 0.01) is affected by several factors like write it as H0: The rival statement that we suspect is true, instead of H0, is called cost of the study, and consequences of rejecting a TRUE null hypothesis. The the alternative hypothesis and we write it as H1: economic and practical implications of rejecting H0 should influence the choice of . DEFINITION 12 Remarks: The probability of observing under H0 a sample outcome at least as extreme as the one 1. The hypothesis under test is called nonparametric if is a nonparametric family and observed is called the p-value. If t0 is the observed value of the test statistic, T and the parametric if is a parametric family. critical region is at the right tail then the p-value is PHo[T t0]. If the critical region is at the 2. Usually the null hypothesis is chosen to correspond to the smaller or simpler subset of left tail then the p-value is PHo[T t0]. The smaller the p-value, the more extreme the 0 of and is a statement of “no difference”. In all cases, we consider the null outcome and the stronger the evidence against H0. hypothesis will be of the form = 0 , 0 , or 0. Note that the equality sign always appears in H0. Remarks: 3. If the distribution of ( X1, , Xn ) is completely specified by a hypothesis, we call it simple 1. The p-value is the smallest level at which the observed sample statistic is significant. hypothesis, otherwise the hypothesis is called composite. Thus, whenever 0 or 1 If the level is pre-assigned and p0 is the p-value associated with t0, then t0 is significant consists of exactly one point, the corresponding hypothesis is simple, otherwise it is at level if p0 . composite. 2. Reporting the p-value instead of fixing permits one to choose his or her level of significance. 3. If the critical region C is two-sided, that is, if C is of the form (T t1 or T t2), then we will DEFINITION 10 double the one-tailed p-value and report it as the p-value even if the distribution is Let H0: and H1: Let be be the set of all possible values of not symmetric. A (decision) rule that specifies a subset C of such that if reject H0 and PRACTICE PROBLEMS if accept H0 is called a test of H0 against H1 and C is called the critical region of the test. A test statistic, T, is a statistic that is used in the specification of C. 1. Let X be a random variable from a probability distribution There are two types of errors that can be made in using such a procedure. Type I error is a. Let T1 = X. Is T1 an unbiased estimator for ? committed when a true null hypothesis is rejected while a Type II error is committed when b. Find an unbiased estimator for . we fail to reject (accept) a false hypothesis. 2. Find a 95% confidence interval estimate for using a single observation from DEFINITION 11 A test of the null hypothesis H0: 0 against H1: 1 is said to have size , 0 1, if Institute of Statistics, CAS, UPLB Institute of Statistics, CAS, UPLB