Econometrics Lecture 4 PDF
Document Details
Uploaded by JollyMoldavite4497
Universitat Pompeu Fabra
2024
Tags
Summary
This document is a lecture on OLS Implementation in Econometrics for a class at Universitat Pompeu Fabra. It covers topics like population regression lines, OLS estimators, and hypothesis testing. The document also mentions specific assumptions and the Gauss-Markov theorem related to the OLS method. It uses examples of different regression models.
Full Transcript
Lecture 4: OLS Implementation 25117 - Econometrics Universitat Pompeu Fabra October 9th, 2024 What we learned in the last lesson - The population regression line, β0 + β1 X , is the mean of Y as a function of the value of X. - The slope, β1 , is the expected differe...
Lecture 4: OLS Implementation 25117 - Econometrics Universitat Pompeu Fabra October 9th, 2024 What we learned in the last lesson - The population regression line, β0 + β1 X , is the mean of Y as a function of the value of X. - The slope, β1 , is the expected difference in Y between two observations with X values that differ by one unit. - The intercept, β0 , determines the level (or height) of the regression line. - The population regression line can be estimated using sample observations (Yi , Xi ) with i = 1,... , n by ordinary least squares (OLS). - The OLS estimators of the regression intercept and slope are denoted β̂0 and β̂1. - The predicted value of Y given X is Ŷ = β̂0 + β̂1 X. Introduction References 2 / 26 What we learned in the last lesson - The R 2 and standard error of the regression (SER) are measures of how close the values of Yi are to the estimated regression line. - The R 2 is between 0 and 1, indicating how much variance in our outcome is explained by the variance of our regressors. - The standard error of the regression estimates the standard deviation of the regression error. - There are three key assumptions for estimating causal effects using the linear regression model: 1 The regression errors, ui , have a mean of 0, conditional on the regressors Xi ; 2 the sample observations are i.i.d. random draws from the population; and 3 large outliers are unlikely - If these assumptions hold, the OLS estimator β̂1 is 1 an unbiased estimator of the causal effect β1 2 consistent 3 normally distributed when the sample is large. Introduction References 3 / 26 Estimation of the regression line We want to learn about the slope of the population regression line. We have data from a sample, so there is sampling uncertainty. There are five steps toward this goal: 1 State the population of interest 2 Provide an estimator of this population object 3 Derive the sampling distribution of the estimator (this requires certain assumptions). In large samples, this sampling distribution will be normal by the CLT. 4 The square root of the estimated variance of the sampling distribution is the standard error (SE) of the estimator 5 Use the SE to construct t-statistics (for hypothesis tests) and confidence intervals. Introduction References 4 / 26 Estimation of the regression line Yi = β0 + β1 Xi + ui - β1 is the slope of the population regression line - β̂1 is the OLS estimator of β1 - When n is large, by the Central Limit Theorem, the sampling distribution of β̂1 is well approximated by a normal distribution σ2 N (β1 ; TSSu X ) Introduction References 5 / 26 Hypothesis Testing - Most of the time, we test for H0 : β 1 = 0 - Null hypothesis and two-sided alternative: - H0 : β1 = β1,0 vs. H1 : β1 ̸= β1,0 - Null hypothesis and one-sided alternative: - H0 : β1 = β1,0 vs. H1 : β1 < (or >)β1,0 Introduction References 6 / 26 Hypothesis Testing - In general, the t-statistic is given by estimator − hypothetised value t= standard error of the estimator - For µY , the t-statistic is given by Y − µY ,0 t= √ sY / n - For β1 , the t-statistic is given by β̂1 − β1,0 t= SE(β̂1 ) r Pn ûi2 q SSR where SE(β̂1 ) = (n−2)TSSX = Pi=1 n when errors are homoskedastic (i.e., (n−2) i=1 (Xi −X ) var (ui | X ) = σu2 ∀i = 1,... , n) - We reject H0 : β1 = β1,0 at 5% significance level if | t |> 1.96, or alternatively, if p − value < 0.05 Introduction References 7 / 26 Stata Application - We regress schools’ average test scores (Y ) against the share of subsidized meals (X ). - Our regression sample includes 500 schools (i = 1,... , 500) Yi = β̂0 + β̂1 Xi + ûi Introduction References 8 / 26 Stata Application - The estimate of the intercept, β̂0 is equal to 847.072 - This is the value of the expected average test score for a school not subsidizing meals at all (i.e., Xi = 0) - Because all schools in our sample data subsidize a non-zero share of meals, it does not really have a real-world interpretation Yi = 847.072 + β̂1 Xi + ûi Introduction References 9 / 26 Stata Application - The estimate of the regression slope, β̂1 is equal to -154.8953 - An 10%-pt increase in the share of subsidized meals is associated with a 15.49 decrease in the average level of test scores - This represents a 25.69% reduction in the standard deviation of test scores Yi = 847.02 − 154.8953Xi + ûi Introduction References 10 / 26 Stata Application - The estimate of the intercept, β̂0 is significantly different from 0. - | tβ̂0 |= 185.09 >> 2.54 (or, alternatively, p − value <.01 - We reject H0 : β0 = 0 at the 1% level Yi = 847.02 − 154.8953Xi + ûi [4.576] Introduction References 11 / 26 Stata Application - The estimate of the regression slope, β̂1 is significantly different from 0. - | tβ̂1 |= 22.4 >> 2.54 (or, alternatively, p − value <.01 - We reject H0 : β1 = 0 at the 1% level Yi = 847.02 −154.8953Xi + ûi [4.576] [6.914] Introduction References 12 / 26 Stata Application Recall that the CI defines - The set of points that cannot be rejected at the 5% significance level; - A set-valued function of the data (an interval that is a function of the data) that contains the true parameter value 95% of the time in repeated samples. CIβ0 = 847.072 + 1.96 ± 4.576 = [838.0803; 856.0636] ∈ /0 CIβ1 = −154.895 + 1.96 ± 6.914 = [−168.4801; −141.3106] ∈ /0 Introduction References 13 / 26 Stata Application Remember that - The SER measures the spread of the distribution of u around the regression line in the units of the dependent variable. - It is an estimator of the standard deviation of the regression error ui. s r Pn r SSR − û i )2 i=1 (ûi 903808.084 SER = = = ≈ 42.601 n−2 n−2 498 Introduction References 14 / 26 Stata Application Remember that - The regression R 2 measures the fraction of the variance of Y that is explained by X - It is unitless and ranges between zero (no fit) and one (perfect fit) Pn explained sum of squares (ESS) (Ŷi − Y )2 910811.722 R 2 = = Pi=1 n = ≈ 0.5019 total sum of squares (TSS) i=1 (Yi − Y ) 2 1814619.81 Introduction References 15 / 26 Homoskedasticity vs. Heteroskedasticity - In regression analysis, we often make an important assumption about the variance of the error term, u, known as homoskedasticity. - Homoskedasticity implies that the variance of the error term is constant for all observations. - However, in most cases, this assumption may not hold, leading to heteroskedasticity, where the variance of u varies across observations. Homoskedasticity vs. Heteroskedasticity For homoskedastic data, we assume: var(ui | X ) = σ 2 for all i For heteroskedastic data, the variance of ui is not constant: var(ui | X ) = σi2 for each i Introduction References 16 / 26 Graphical Illustration - Understanding homoskedasticity and heteroskedasticity is crucial in regression analysis. - Homoskedasticity assumes constant error variance, while heteroskedasticity implies varying error variance. - To account for heteroskedasticity, robust standard errors can be used to obtain valid inferences. Introduction References 17 / 26 Graphical Illustration - Understanding homoskedasticity and heteroskedasticity is crucial in regression analysis. - Homoskedasticity assumes constant error variance, while heteroskedasticity implies varying error variance. - To account for heteroskedasticity, robust standard errors can be used to obtain valid inferences. Introduction References 18 / 26 var [β̂1 ] with robust standard errors "P # n (Xi − X )ui var (β̂1 | X ) = var (β̂1 − β1 | X ) = var Pi=1 n 2 X i=1 (Xi − X ) "P # n i=1 (Xi − X )ui = var X TSSX " n # 1 X = var (Xi − X )ui X TSSX2 i=1 Pn (Xi − X )2 var (ui | X ) = i=1 TSSX2 1 n (Xi − X )2 var (ui | X ) P = n i=1 n−1 2 n sX Introduction References 19 / 26 var [β̂1 ] with robust standard errors So, when n is large var (β̂1 ) converges toward Pn 2 (Xi − µX )2 var (ui | X ) σβ̂ = i=1 1 nσX2 σ2 and β̂1 ∼ N (β1 ; nσν2 ) where ν = (Xi − µX )ui X The (robust) natural estimator of σβ̂2 follows: 1 1 Pn 2 2 1 n−2 i=1 (Xi − X ) ûi σ̂β̂2 = i2 n 1 n 1 h P n i=1 (Xi − X ) (Heteroskedasticity-)Robust q standard errors (or Eicker–Huber–White standard errors) are then SE(βˆ1 ) = σ̂β̂2 1 Introduction References 20 / 26 In practice - If the errors are either homoskedastic or heteroskedastic and you use heteroskedastic-robust standard errors, you are OK - If the errors are heteroskedastic and you use the homoskedasticity-only formula for standard errors, your standard errors will be wrong - The homoskedasticity-only estimator of the variance of β̂1 is inconsistent if there is heteroskedasticity. Introduction References 21 / 26 The Theoretical Foundations of OLS Let us consider a linear regression model: Yi = β0 + β1 Xi + ui Subject to the following assumptions: 1 Zero Conditional Mean: The errors˘i have a conditional mean of zero given the values of the independent variables, i.e., E(ui |Xi ) = 0 for all i. 2 Random Draws: observations (Yi ; Xi ) are randomly drawn from the same population distribution, i.e., they are i.i.d. for all i. 3 Large outliers are rare: E(Y 4 ) < ∞ and E(X 4 ) < ∞ 4 Homoskedasticity: The errors have constant variance, i.e., Var(ui ) = σu2 for all i. 5 u is normally distributed: u ∼ N (0; σu2 ). Gauss-Markov Theorem Under assumptions 1-4, the ordinary least squares (OLS) estimators β̂0 and β̂1 have the minimum variance among all linear unbiased estimators, making them Best Linear Unbiased Estimators (BLUE). Introduction References 22 / 26 The Gauss-Markov Theorem - The OLS estimator β̂1 is a linear estimator, that is, it can be written as a linear function of Y1 ,..., Yn : Pn (Xi −X )ui β̂1 − β1 = Pi=1 n 2 ≡ wiOLS ui i=1 (Xi −X ) Pn (X −X ) where wiOLS = Pni=1(X i−X )2 i=1 i - The Gauss-Markov theorem states that among all possible choices of {wi } the OLS weights wiOLS yield the smallest var (β̂1 ). - Under assumptions 1-5, the ordinary least squares (OLS) estimators β̂0 and β̂1 have the minimum variance among all (linear and non-linear) unbiased estimators Introduction References 23 / 26 The Gauss-Markov Theorem: Limits The Gauss-Markov Theorem is a crucial justification for using OLS in many regression analyses, but it is very restrictive and, therefore, has serious practical limitations. - The result is only for linear estimators — only a small subset of estimators - The condition of homoskedasticity often doesn’t hold (homoskedasticity is a special case) - When estimating the population mean, if there are significant outliers, the median is preferred over the mean because it’s less sensitive and has lower variance. Similarly, in regression, OLS estimators can also be sensitive to outliers. In such cases, other estimators like the Least Absolute Deviations (LAD) estimator may be more efficient: n X min | Yi − β0 − β1 Xi | β̂0 ;β̂1 i=1 Introduction References 24 / 26 Back to the original question Suppose schools receive extra transfers from the central government so the share of subsidized school meals increases by 10%-pt. What is the effect of this policy intervention (“treatment”) on test scores? - This question requires an estimate of the causal effect on test scores of a change in the share of subsidized meals. Does our regression analysis using the California data set provide a compelling estimate of this causal effect? - Not really – districts with a low share of subsidized meals tend to be ones with lots of other resources and higher income families, which provide kids with more learning opportunities outside school... this suggests that corr (ui ; Xi ) > 0 (that is, E(ui | Xi ) ̸= 0). - It seems that we have omitted some factors, or variables, from our analysis, and this has biased our results... Introduction References 25 / 26 Material I – Textbooks: - Introduction to Econometrics, 4th Edition, Global Edition, by Stock and Watson – Chapter 5 - Introductory Econometrics, 5th Edition, A Modern Approach, by Jeff. Wooldridge – Chapter 8. – Papers: - Aron-Dine, Aviva, Liran Einav, and Amy Finkelstein. ”The RAND health insurance experiment, three decades later.” Journal of Economic Perspectives 27.1 (2013): 197-222. - Taubman, S. L., Allen, H. L., Wright, B. J., Baicker, K., & Finkelstein, A. N. (2014). Medicaid increases emergency-department use: evidence from Oregon’s Health Insurance Experiment. Science, 343(6168), 263-268. Introduction References 26 / 26