EEP/IAS 118 - Introductory Applied Econometrics, Section 2 PDF

EEP/IAS 118 - Introductory Applied Econometrics, Section 2 Leila Njee Bugha and Nicolas Polasek Week of September 9 Overview and announcements Small Assignment #1 due TODAY at 11.59pm on Gradescope Today: Regression model overview Linear regression assumptions Properties of β and R2 2 / 24 Economic Models An economic model is a equation that describes relationships. For example, we can try to describe participation in pre-K: y = f ( x1 , x2 , x3 , x4 , · · · , x6 ) where y =hours spent at pre-K, x1 = state-level law and enforcement, x2 =hourly wage of parents,..., x6 =age of child. We turn this economic model into a econometric model by assigning a functional form (linear): preK = β 0 + β 1 en f orcement + β 2 wage + · · · + β 6 age + u Note that u here contains all the unobserved variables (e.g. family background, grandparents or relatives living close by, etc.) that we cannot include in the model. 3 / 24 Population Regression Function Consider a version of this model where pre-K attendance (y) is only a function of parental wages (x): y = f ( x, u) = β 0 + β 1 x + u Let us assume this is the “true data generating process” (i.e. the real model) 4 / 24 Population Regression Function Consider a version of this model where pre-K attendance (y) is only a function of parental wages (x): y = f ( x, u) = β 0 + β 1 x + u Let us assume this is the “true data generating process” (i.e. the real model) u = y − β 0 − β 1 x is the error term. We make an important assumption that E(u| x ) = E(u) = 0, in the “true” model we have written down. This assumption allows us to define a linear population regression function: E(y| x ) = β 0 + β 1 x 4 / 24 Population Regression Function E(y| x ) = β 0 + β 1 x The PRF describes how the average value of y changes with x. Note, the above picture isn’t linear, but for this class we will assume it is. 5 / 24 Regression with sample The above example is done with a population, which we almost never observe. Instead, we work with samples. The PRF is y + β 0 + β 1 x + u. We observe x and y, but do not observe β 0 , β 1 and u Goal is to approximate the PRF using a sample regression function (SRF): ŷi = β̂ 0 + β̂ 1 xi yi = β̂ 0 + β̂ 1 xi + û = ŷi + ûi is our estimated model. “hats” denote estimates of true values/parameters function of the data (not true parameters) 6 / 24 Regression with sample The above example is done with a population, which we almost never observe. Instead, we work with samples. The PRF is y + β 0 + β 1 x + u. We observe x and y, but do not observe β 0 , β 1 and u Goal is to approximate the PRF using a sample regression function (SRF): ŷi = β̂ 0 + β̂ 1 xi yi = β̂ 0 + β̂ 1 xi + û = ŷi + ûi is our estimated model. ŷ is our best guess at the true E(y| x ) β̂ is our best guess at the true relationship between x and y ûi is the residual and is the deviation between the real observed yi and the ŷi our model predicts. That is : ûi = yi − ŷi 7 / 24 Regression with sample The PRF and SRF will (almost) never be the same! But on average we will get it right (Note: ūi = ûi ) 8 / 24 Assumptions of Linear Regression We make these assumptions about the ”true data generating process” Model Simple SLR.1 The population model is linear in parameters y = β 0 + β 1 x1 + u SLR.2 {( xi , yi ), i = 1 · · · N } is a random sample from the population SLR.3 The observed explanatory variable ( x ) is not constant: Var ( x ) ̸= 0 SLR.4 No matter what we observe x to be, we expect the unobserved u to be zero E[u| x ] = 0 SLR.5 The “error term” has the same variance for any value of x : Var (u| x ) = σ2 9 / 24 Assumption 1: Linearity “The population model is linear in parameters y = β 0 + β 1 x1 + u” Does this prevent us from estimating nonlinear models such as polynomials and logarithmics? No! We only need the model to be linear in parameters (i.e. the β k ’s) We can transform each of x1... xn however we’d like y = β 0 + β 1 x2 + u or y = β 0 + β 1 log x + u are ok. Just rename x2 or log x as z y = log( β 0 + β 1 x + u) is not estimable by OLS. It is not a linear function of the β parameters. 10 / 24 Assumption 2: Random Sample ”{( xi , yi ), i = 1 · · · N } is a random sample from the population.” Assumption 2 is fairly non-technical. You just need to know how your data were collected. You can’t draw inference about a population if your sample doesn’t represent the population well. Most survey data is a (reasonably) random sample of some population. An example where this would fail is a survey about sensitive or illicit subject matter. You have reason to believe that the people who agree to be surveyed are different than those who don’t 11 / 24 Assumption 3: Non-constant X ”The observed explanatory variable ( x ) is not constant: Var ( x ) ̸= 0” Assumption 3 is almost trivial. You just need to have some variation in your sample. If everyone in your sample smokes exactly 14 cigarettes per day, you can’t estimate the correlation between an additional cigarette on health outcomes. Becomes a little bit more interesting with multiple regression (multicollinearity), but not much 12 / 24 Assumption 4: Mean Independence ”No matter what we observe x to be, we expect the unobserved u to be zero: E[u| x ] = 0” The ”mean independence” assumption on the error term E[u| x ] = 0 is probably the most critical assumption we make in regression. This assumption allows us to think about β in causal terms - i.e. ”the causal effect of one more unit of X’s on the expected value of Y” Classic example of violating this assumption is regression of income on education IF we could control for all variables that affect income then we could recover the true effect of education on income But we can never observe everything. E.g. we don’t observe ability, which is correlated with education and income and thus biases our estimate of educations effect on earnings Omitted Variable Bias (OVB) is an example of violating this assumption. 13 / 24 Violation of assumption 4 14 / 24 Assumption 5: Homoskedasticity The assumption that Var (u| x ) = σ2 is called the homoskedasticity assumption. A violation of this assumption would look like this (heteroskedasticity): 15 / 24 What do we get from these assumptions? Using only assumptions 1 - 4, we can prove that: 1 E( β̂ 1 ) = β 1 2 E( β̂ 0 ) = β 0 The means of our estimators β̂ 1 and β̂ 0 are our true population parameters β 1 and β 1 If we add assumption 5, we can also show that: 3 Var ( β̂ 1 ) = σu2 /SSTx NOTE: We don’t know σu2 (or SSTx ) as this is a population parameters. So to ∑i û2i calculate this we use an estimator: σ̂u2 = n −2 16 / 24 Properties of β 0 , β 1 The equations of β 0 and β 1 : β̂ 1 = β̂ 0 = These are derived by minimizing the sum of squared errors (û), a process called Ordinary Least Squares (OLS). OLS has some nice estimation properties, which is why we use it. DON’T worry about the derivation (see notes appendix) 17 / 24 Interpreting β̂ - Sign, Size, Significance When asked to ”interpret your results” you should check 3 things: 1 Sign: What sign did you expect the estimated parameter to have? Why? Does your estimate have this sign (i.e. are you surprised or reassured by your results)? 2 Size: How do changes in this variable affect the dependent variable according to your estimation? Is this an economically meaningful effect size? 3 Significance: Is the estimate statistically different from zero? What is the t-statistic of this hypothesis? Don’t worry about this for now, we will deal with this in more detail later in the course. 18 / 24 Example Interpretation Example: Exercise 2.4 Woolridge: Let’s examine a regression of baby birthweight (in ounces) on number of daily cigarettes smoked by the mother: \ = 119.77 − 0.514cigs bwght 1 Interpret the coefficient on cigs. 2 What is the predicted birthweight when cigs = 0? 3 To predict a birthweight of 125, what would cigs have to be? 19 / 24 Example Interpretation \ = 119.77 − 0.514cigs bwght 1 Interpret the coefficient on cigs. 2 What is the predicted birthweight when cigs = 0? 3 To predict a birthweight of 125, what would cigs have to be? 20 / 24 Goodness of fit: R2 The R2 is a useful measure of how well our model ”fits” or explains the data. This can be informative about whether our specified model is close to the true relationship between two variables. For example: Agricultural Employment and GDP per capita (2012) Agricultural Employment (% of Total Employment) 0 20 40 60 0 20 40 60 80 GDP per capita If we fit a simple linear model to this line, it would be a poor fit. The low R2 would help indicate this fact. 21 / 24 Goodness of fit: R2 Three main terms to define to understand the R2 and how to calculate it: 1 Sum of Square Total (SST) = ∑in (yi − ȳ)2 2 Sum of Squares Explained (SSE) =∑in (ŷi − ȳ)2 3 Sum of Squared Residuals (SSR) = ∑in (yi − ŷ)2 Note: SST = SSE + SSR You can think of the R2 as how much of the total sample variation in y is explained by our model. R2 = R2 is always less than 1. Being closer to one indicates a better model fit. 22 / 24 Example Interpretation Using data from 1988 for houses sold in Massachusetts, the following equation relates housing prices in dollars (price) to the distance in meters from a recently built garbage incinerator (dist): log\( price) = 9.40 + 0.312 log(dist) n = 135, R2 = 0.162 a) Interpret the coefficient on log(dist). Is the sign of this estimate what you expect it to be? 23 / 24 Example Interpretation log\ ( price) = 9.40 + 0.312 log(dist) n = 135, R2 = 0.162 b) How much of the variation in log price is explained by the log distance to the garbage incinerator? c) What other factors about a house can affect its price? 24 / 24

EEP/IAS 118 - Introductory Applied Econometrics, Section 2 PDF

Document Details

Tags

Related

Summary

Full Transcript