Statistical Inference Procedures PDF

Statistical Inference Procedures STAT*2050 Dr. Clémonell Bilayi-Biakana Department of Mathematics and Statistics University of Guelph September 6, 2024. Outline Statistical Inference - Basic Concepts Statistical Inference Problems Inference Procedures for Means “The only way to learn mathematics is to do mathematics.” - Paul Halmos. Statistical Inference - Basic Concepts Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of data. The study of statistics can be subdivided into two main areas: 1. Descriptive statistics. 2. Inferential statistics. As a field of mathematics, Statistics has its own language. It is important to begin our study with an introduction of some basic concepts in order to understand and communicate effectively about the subject. Statistical Inference - Basic Concepts The population is the set of all individuals or statistical units of interest to an investigator. The number of statistical units within a population is referred to as population size. It is usually denoted N. A random variable X is a rule that assigns an outcome to a statistical unit i. In other words, it is a characteristic of any entity being studied on a population. In Statistics, there are two types of variables: 1. Qualitative or Categorical Variables: 2. Quantitative Variables. Statistical Inference - Basic Concepts A parameter, θ, is a number that describes some aspect of a population. For example, θ = E (X ), V (X ), etc. Properties Let X and Y be random variables and a, b real numbers. Then, E (a) = a. E (aX + b) = aE (X ) + b. E (aX + bY ) = aE (X ) + bE (Y ). V (X ) ⩾ 0 and V (a) = 0. V (X ) = E (X 2 ) − E 2 (X ). V (aX + b) = a2 V (X ). V (aX + bY ) = a2 V (X ) + b 2 V (Y ), if X and Y are independent random variables. That is, the probability distribution of X does not change when the value of Y changes (and vice versa). Statistical Inference - Basic Concepts A random sample of size n is a representative subset of n statistical units selected from the population. In a population where a random variable X is studied, (X1 ,... , Xn ) represents a random sample of size n drawn from this population and (x1 ,... , xn ), the observed random sample. It is worth noting that the measurements X1 , X2 ,... , Xn are copies of X while each xi is an observed value of the measurement Xi on the individual i. X1 , X2 ,... , Xn are independent and identically distributed (i.i.d) random variables. They have the same probability distribution as X. As a result, the following hold: ( E (X1 ) = E (X2 ) = · · · = E (Xn ) = µ V (X1 ) = V (X2 ) = · · · = V (Xn ) = σ 2. Statistical Inference - Basic Concepts We consider a statistical population on which a random variable X is studied. Assume that the probability distribution of X depends on an unknown parameter θ. We draw from this population (X1 ,... , Xn ), a random sample of size n. Let (x1 ,... , xn ) be its corresponding observations or observed values. Common statistical inference problems include 1. point and interval estimation 2. hypothesis testing Statistical Inference - Basic Concepts Statistical inference of θ is usually done by constructing functions Θ̂n = h(X1 ,... , Xn ) of the random sample (X1 ,... , Xn ), and these summaries are called sample statistics or statistics. Any statistic Θ̂n is a random variable. The probability distribution of a statistics is called sampling distribution. That is, the distribution of sample statistics computed for different samples of the same size from the same population. A sampling distribution shows us how the sample statistic varies from sample to sample. The standard deviation of a point estimator Θ̂n of θ is called standard error. That is, r V Θ̂n. Statistical Inference - Basic Concepts A point estimator Θ̂n is any sample statistic used to estimate the unknown parameter θ. Is observed value is θ̂n and called the point estimate of θ. In what follows, let (X1 ,... , Xn ) be a random sample with observations (x1 ,... , xn ). We introduce the common statistics. X̄ , S 2 , S and P̂ are called sample mean or average, variance, standard deviation, and proportion, respectively. In addition, x̄, s 2 , s and p̂ are their respective values. The difference between a data point xi and its mean x̄ is called a deviation. The deviation of the i-th observation is xi − x̄. Statistical Inference - Basic Concepts Parameter θ Point Estimator Θ̂ Point Estimate θ̂ n X n X 1 1 µ = E (X ) X̄ = n Xi x̄ = n xi i=1 i=1 n X n X σ 2 = V (X ) S2 = 1 n−1 (Xi − X̄ )2 s2 = 1 n−1 (xi − x̄)2 i=1 i=1 v v u n u n u 1 X u 1 X (Xi − X̄ )2 (xi − x̄)2 p σ= V (X ) S = n−1 t s = n−1 t i=1 i=1 p = Y /N P̂ = Ỹ /n p̂ = ỹ /n Statistical Inference - Basic Concepts Exercise Let X be a normally distributed random variable with mean µ and variance σ 2. Find the best point estimators and estimates for 3µ + 5, µ + σ 2 and, σ 2 /15 if a sample of size 16 yields the summaries below: 16 X 16 X xi = 49 and xi2 = 374. i=1 i=1 Statistical Inference - Basic Concepts Exercise In a manufacturing company, a data analyst is interested in the weight in grams of brown granulated sugar bags. For this, he samples 10 brown sugar bags and observes the following weights: 27.7 31.5 30.9 29.6 27.0 38.1 32.4 31.1 36.7 28.4. 1. Find the sample average weight of these brown sugar bags. 2. Find the sample variance and standard deviation weight of these brown granulated sugar bags. 3. Find the sample proportion of brown sugar bags whose weight is above 35 grams. Sampling Distributions Sums of 1 normal Sums of 2 normal 0.4 0.20 0.3 Density Density 0.2 0.10 0.1 0.00 0.0 −3 −1 0 1 2 3 −4 −2 0 2 4 u1 u2 Sums of 5 normal Sums of 25 normal 0.08 0.00 0.05 0.10 0.15 Density Density 0.04 0.00 −5 0 5 −15 −5 0 5 10 20 u5 u25 Sampling Distributions Theorem (Normal Transforms) Let X and Y be two independent random variables. For any constant a and b, let aX + bY be a weighted sum of X and Y. If X ∼ N (µ1 , σ12 ) Y ∼ N (µ2 , σ22 ), Then, aX + bY ∼ N aµ1 + bµ2 , a2 σ12 + b 2 σ22. Remark Thanks to this result, the following hold: ( X + Y ∼ N µ1 + µ2 , σ12 + σ22 X − Y ∼ N µ1 − µ2 , σ12 + σ22. Sampling Distributions Theorem Let (X1 , X2 ,..., Xn ) be a random sample from X ∼ N (µ, σ 2 ). If the population variance σ 2 is known, then the sample mean X̄ has the following distribution: n σ2 1X X̄ = Xi ∼ N µ, n n i=1 X̄ − µ ⇔Z = √ ∼ N (0, 1). σ/ n Sampling Distributions Lemma Commonly called critical values, quantiles of order α and α/2 of Z ∼ N (0, 1) are the real numbers zα and zα/2 that satisfy respectively the following equations: P(Z > zα ) = α and P(Z ⩽ zα ) = 1 − α P(Z > zα/2 ) = α/2 and P(Z ⩽ zα/2 ) = 1 − α/2. Sampling Distributions Exercise Based on information from the National Health and Nutrition Examination Survey, the height (in inches) of a 10-year-old girl is normally distributed with a mean of 56.9 and a standard deviation of 2.8. Also, the height (in inches) of a 10-year-old boy is normally distributed with a mean of 56 and a standard deviation of 3.5. A fourth-grade class has 12 girls and 8 boys. The children’s heights are recorded on their 10th birthdays. The heights of the students in this class are assumed to be random samples from these populations. What is the probability that the mean height of the girls in this class is greater than the mean height of the boys in this class? Sampling Distributions Sampling Distributions Theorem (Central Limit Theorem (C.L.T)) Let X1 , X2 ,... , Xn be a sequence of n independent and identically distributed (i.i.d) random variables such that ( E (X1 ) = E (X2 ) = · · · = E (Xn ) = µ V (X1 ) = V (X2 ) = · · · = V (Xn ) = σ 2. are finite. If n is large (n ⩾ 30), then σ2 X̄n − µ X̄n ≈ N µ, ⇔Z = √ ≈ N (0, 1) n σ/ n n X S ∗ − nµ Sn∗ = Xi ≈ N nµ, nσ 2 ⇔ Z = n √ ≈ N (0, 1). σ n i=1 Sampling Distributions Exercise One of the objectives of a study is to describe the distribution of the body mass index (BMI) for women whose age is between 25 and 35 years. Suppose that women in this age group have an average BMI of 26.8 with a standard deviation of 7.42. A random sample of 50 women in this age group is drawn from a population. 1. What is the probability that the average BMI for these 50 women is greater than 29? 2. What is the probability that the total BMI for these 50 women is between 1200 and 1350? Sampling Distributions Lemma Let p be the proportion of individuals with a certain characteristic X in a large population. Let (X1 ,... , Xn ), (n ⩾ 30), denote a large random sample. The sample proportion is defined as P̂ = Y /n, with Y being the number of individuals with characteristic X in this sample. Note that Y ∼ B(n, p). Using the Central Limit Theorem, p(1 − p) P̂ − p P̂ ≈ N p, ⇔Z = q ≈ N (0, 1). n p(1−p) n This approximation can be improved by accounting for the fact that the binomial distribution is discrete and the normal distribution is continuous. For this, a correction for continuity is needed. Sampling Distributions Exercise An epidemiologist wants to study the prevalence of the use of oral contraceptives in a certain population. Suppose that the proportion of women using these contraception methods in the population is 0.13. A random sample of 100 women is drawn from this population. What is the probability that the sample proportion falls between 0.10 and 0.16? Sampling Distributions Lemma (Non-Normal Transforms) i.i.d If Z1 , Z2 ,... , Zn ∼ N (0, 1), then n X U= Zi2 ∼ χ2n. (1) i=1 Let Z ∼ N (0, 1) and V ∼ χ2n. If Z ⊥ ⊥ V , then Z T =p ∼ Tn. (2) V /n If V ∼ χ2n , W ∼ χ2m and V ⊥ ⊥ W , then V /n F = ∼ Fn,m. (3) W /m Sampling Distributions Theorem Let (X1 ,..., Xn ) be a random sample from X ∼ N (µ, σ 2 ). Then n 2 (n − 1)S 2 X Xi − X̄ = ∼ χ2n−1. σ2 σ i=1 Further, X̄ and S 2 are independent random variables. Figure: Probability density of X ∼ χ2n. Sampling Distributions Lemma Commonly called critical values, the quantiles of order α and α/2 of X ∼ χ2n is any real numbers χ2n,α and χ2n,α/2 such that P(X > χ2n,α ) = α and P(X ⩽ χ2n,α ) = 1 − α P(X > χ2n,α/2 ) = α/2 and P(X ⩽ χ2n,α/2 ) = 1 − α/2 Exercise If a random sample of 25 readings is taken from a distribution X ∼ N (10, 100), 1. find the sampling distributions for X̄ and S 2. 2. find P(6 < X̄ < 8) and the real number c such that P(S 2 > c) = 0.05 Sampling Distributions In most practical cases, the variance σ 2 of a random variable X is unknown. So, to get any idea of the variability of X̄ , it is necessary to estimate σ 2 with S 2. As a result, the sampling distribution of the following statistic is needed X̄ − µ T = √ · S/ n Theorem Let (X1 , X2 ,..., Xn ) be a random sample from X ∼ N (µ, σ 2 ) with σ 2 -unknown. Then, the random variable T has Student’s t-distribution with n − 1 degrees of freedom. That is, X̄ − µ T = √ ∼ Tn−1. S/ n Sampling Distributions Figure: Probability density of T ∼ Tn. Remarks 1. The T distribution is centered at 0, however is it more dispersed than a standard normal distribution N(0, 1). The T distribution has heavy tails. 2. Note that as n increases, T ≡ Z , where Z ∼ N (0, 1). Sampling Distributions Lemma Commonly called critical values, quantiles of order α and α/2 of T ∼ Tn are the real numbers tn,α and tn,α/2 that satisfy respectively the following equations: P(T > tn,α ) = α and P(T ⩽ tn,α ) = 1 − α P(T > tn,α/2 ) = α/2 and P(T ⩽ tn,α/2 ) = 1 − α/2. Exercise Let (X1 ,... , X15 ) be a random sample from a normal variable X with mean 7.5. Let X̄ and S 2 denote the sample mean and the sample variance, respectively. Find c such that X̄ − 7.5 P √ < c = 0.10. S/ 15 Sampling Distributions If one is interested in comparing the variability of two populations, one quantity of interest would be the ratio σ12 /σ22. The information about this ratio is contained in Sn2 /Sm 2 Theorem Let (X1 , X2 ,... , Xn ) be a sample from X ∼ N (µ1 , σ12 ). Let (Y1 , Y2 ,... , Ym ) be a sample from Y ∼ N (µ2 , σ22 ). Further, if X and Y are two independent random variables (X ⊥⊥ Y ), then the random variable F has Snedecor’s F distribution with n − 1 and m − 1 degrees of freedom, i.e Sn2 /σ12 2 /σ 2 ∼ Fn−1,m−1. (4) Sm 2 Sampling Distributions Lemma Commonly called critical values, the quantiles of order α and α/2 of X ∼ Fn,m are any real numbers fn,m,α and fn,m,α/2 that satisfy respectively the following equations: P(X > fn,m,α ) = α and P(X ⩽ fn,m,α ) = 1 − α P(X > fn,m,α/2 ) = α/2 and P(X ⩽ fn,m,α/2 ) = 1 − α/2 Interval Estimation - Confidence Intervals Definition Let X be a random variable whose probability distribution depends on an an unknown population parameter θ. Let (X1 , · · · , Xn ) be a random sample of size n drawn from X and (x1 ,... , xn ) its observations. A (1 − α)% confidence interval of θ is any interval I (X1 ,... , Xn ) = [L(X1 , · · · , Xn ), U(X1 , · · · , Xn )] such that P (θ ∈ I (X1 ,... , Xn )) = 1 − α. Note that (1 − α) is called confidence level. Interval Estimation - Confidence Intervals Remarks Using the observations (x1 ,... , xn ), we construct the following observed (1 − α)% confidence intervals. Iθ = [ℓ(x1 ,... , xn ), u(x1 ,... , xn )] = [θ̂ − m, θ̂ + m], where θ̂ is the sample point estimate of θ and m the margin of error. Notice this Iθ stands for the interval estimate of θ. The size of the margin of error dictates how precise our point estimate of the parameter θ is. Large margin of error - less precision. Small margin of error - more precision. We can increase the precision of our estimate by decreasing the margin of error. Interval Estimation - Confidence Intervals Figure: Repeated Construction of a Confidence Interval for µ. This figure shows several 100(1 − α)% confidence intervals for the mean µ of a normal distribution with known variance σ 2. The dots at the center of the intervals indicate the point estimate , x̄, of µ. Interval Estimation - Confidence Intervals Remarks 1. What does being 95% confident that the population mean is in an interval actually indicate? 2. It indicates that if an analyst were to randomly select 100 samples from a random variable whose probability distribution depends on an unknown parameter θ and use the results of each sample to construct a 95% confidence interval, approximately 95 of the 100 intervals would contain the population mean. It also indicates that 5% of the intervals would not contain the population mean. 3. Suppose 20 random samples are taken from the population. If a 95% confidence interval is used, then how many intervals are likely to contain µ? If a 90% confidence interval is constructed, how many intervals are likely to contain µ? Hypothesis Tests -The Courtroom Problem Motivating Example 1. In a courtroom, a defendant standing trial for a crime is assumed innocent until proven guilty. 2. It is the job of the prosecution to present evidence showing that the defendant is guilty beyond a reasonable doubt. It is the job of the defense to challenge this evidence to establish a reasonable doubt. 3. The jury weighs the evidence and makes a decision. Hypothesis Tests - Hypothesis Testing Problem This courtroom example can be decomposed into 3 major parts: 1. Formulation of Hypotheses ( H0 : The defendant is innocent. (H) : H1 : The defendant is guilty. 2. Fact-checking evidence 3. Assessment of Evidence and Decision Making. In what way does the Courtroom problem relate to the Hypotheis Hypothesis Tests - Hypothesis Testing Problem In what follows, we consider a random variable X whose probability distribution depends on an unkwown parameter θ. A statistical hypothesis is a statement about an unknown population parameter θ. This may or may not be true. H0 is called the null hypothesis, that is the statement being tested in a test of significance or the statement we seek evidence against. H0 is assumed to be true until the data prove otherwise. H1 or Ha is called the alternative hypothesis, that is the statement we hope or suspect is true or the statement we seek evidence for. Hypothesis Tests - Hypothesis Testing Problem 1. Let θ0 be a fixed or hypothesized value of θ. Here are the possible formulation of hypotheses that we will deal with: a two-tailed test: H0 : θ = θ0 versus H1 : θ ̸= θ0. a right-tailed test: H0 : θ = θ0 versus H1 : θ > θ0. a left-tailed test: H0 : θ = θ0 versus H1 : θ < θ0. A two tailed-test is also known as a two-sided test. Left-tailed and right-tailed tests fall into the class of one-sided tests. 2. Given the random sample (X1 ,..., Xn ) with observed values (x1 ,..., xn ), the hypothesis problem consists of finding a decision rule that will lead to a decision to reject or fail to reject H0. The sample quantity, Θ̂n = T (X1 ,... , Xn ), on which the decision to support H0 or H1 is based is called test statistic. It measures compatibility between H0 and the data. Hypothesis Tests - Hypothesis Testing Problem To proceed with sample data-checking evidence, we 1. find the sampling distribution of Θ̂n under H0 2. and compute its observed value T (x1 ,... , xn ). The value of the test statistic is used to make a decision about H0. 3. To assess the evidence against H0 and make the decision, we use the level of significance α. That is, how much evidence against H0 is regarded as decisive by the experimenter. In practice, this assessment is done by computing either the critical value of the distribution of Θ̂n under H0 or the p-value. Assuming H0 is true, the p-value is the likelihood that the test statistic Θ̂n will weigh against H0 at least as strongly as it does for these data. That is, a p-value is a measure of the strength of sample data evidence against H0. Hypothesis Tests - Inference as a Decision All in all, a test of significance or hypothesis test is a process for assessing the significance of the evidence provided by data against the null hypothesis H0. It is a 3-step process that consists of formulating H0 and H1 , sample data-checking evidence, and decision making. Recall the courtroom problem: ( H0 : The defendant is innocent. (H) : H1 : The defendant is guilty. The decision made by the jury can be either correct or wrong. In hypothesis testing, we decide whether to reject or fail to reject H0. Because we are observing a sample and not an entire population, it is possible that a conclusion may be wrong. Hypothesis Tests - Inference as a Decision 1. The jury concludes there is enough evidence to convict the defendant. In hypothesis testing, this “guilty verdict” is equivalent to rejecting H0. The experimenter makes this decision when the sample data provides sufficient evidence against H0. In other words, data are incompatible with H0. Using the p-value approach, the evidence against H0 is given by p − value < α. This p-value condition means H0 is less likely to be true. A smaller p-value suggests stronger evidence against H0. In this case, we say that the evidence is statistically significant. Hypothesis Tests - Inference as a Decision 2. The jury concludes that there is not enough evidence to conclude beyond a reasonable doubt that the person is guilty. Notice the jury does not conclude that the person is innocent. In hypothesis testing, this “not guilty verdict” corresponds to failing to reject H0. The experimenter makes this decision when the sample data provides insufficient evidence against H0. Using the p-value approach, the evidence against H0 is given by p − value ⩾ α. This p-value condition means H0 is more likely to be false. A greater p-value suggests lesser evidence against H0. In this case, we say that the evidence is not statistically significant. 3. The jury convicts an innocent defendant. This wrongful conviction is referred to as type-I error. 4. The jury says a defendant is not guilty when she really is. This wrongful acquittal is referred to as type-II error. Hypothesis Tests - Inference as a Decision A type I error occurs if one rejects the null hypothesis when it is true. The probability of making a type I error is: α = P(reject H0 ∥H0 is true). A type II error occurs if one does not reject H0 when it is false. The probability of making a type II error is: β = P(fail to reject H0 ∥H0 is false). The power of a test measures its ability to detect an alternative hypothesis. It is calculated as follows π = P(reject H0 ∥H0 is false) = 1 − β. The power of the test against an alternative H1 is the probability that the test will choose that H1 when it is true. Hypothesis Tests - Inference as a Decision A criminal trial can possibly result in the following 4 decisions: H0 is true H0 is not true Reject H0 Wrongful Conviction Guilty Verdict Do Not Reject H0 Not Guilty Verdict Wrongful Acquittal The 4 outcomes of the decision-making in hypothesis testing are: H0 is true H0 is not true Reject H0 Type-I Error (α) Correct Decision (π) Do Not Reject H0 Correct Decision Type-II Error (β) Hypothesis Tests - Inference as a Decision A critical or rejection region is a set of values for the test statistic for which the null hypothesis is rejected. That is, C = {(x1 ,... , xn ) | H0 is rejected}. An acceptance or non rejection region is a set of values for the test statistic for which the null hypothesis is not rejected. It is the complement of the critical region. That is, A = {(x1 ,... , xn ) | H0 is not rejected}. It is worth noting that the inversion of an acceptance region yields a (1 − α)% confidence interval. Hypothesis Tests - Inference as a Decision Exercise A cigarette manufacturer claims that its cigarettes contain an average of 40 mg of tar. The amount of tar in a cigarette is normally distributed with a standard deviation of 16 mg. A medical association claims that the average tar level exceeds this amount. A random sample of size 64 cigarettes is selected and the null hypothesis is rejected if the sample mean exceeds 43.5 1. Find the critical and acceptance regions of this problem. 2. Explain what a Type I error and a Type II error would mean. 3. What is the probability of a Type I error? 4. What is the probability of a Type II error when the average amount of tar in a cigarette is 41, 42, 43, 44, 45, 46, 47? 5. From the calculations made, draw the graph of the power function of this test. Normal Population With Known Variance Let X be a random variable such that X ∼ N (µ, σ 2 ), with known variance σ 2. Let µ0 be a hypothesized value of µ. We would like to perform one of the following tests: 1. H0 : µ = µ0 versus H1 : µ ̸= µ0 2. H0 : µ = µ0 versus H1 : µ > µ0 3. H0 : µ = µ0 versus H1 : µ < µ0. at a significance level α. Let (X1 ,... , Xn ) be a random sample drawn from X and (x1 ,... , xn ) its observations. Therefore, σ2 σ2 X̄ ∼ N µ, ⇒ X̄ ∼ N µ0 , , under H0. n n X̄ − µ0 ⇒ Z0 = √ ∼ N (0, 1), under H0. σ/ n

Statistical Inference Procedures PDF

Document Details

Tags

Related

Summary

Full Transcript