Advanced Econometrics Lecture Notes Fall 2024 PDF
Document Details
2024
ADEC-3070
Dr. Manini Ojha
Tags
Related
- Advanced Statistical Analysis - Week 1 Lecture 2
- Advanced Statistical Analysis Lecture Notes
- Advanced Statistical Analysis Lecture Notes - University of Groningen
- Advanced Statistical Analysis Lecture Notes (University of Groningen)
- Advanced Statistical Analysis Lecture Notes (Week 4)
- Advanced Statistical Analysis Lecture Notes - 27 Feb 2023
Summary
These are lecture notes for an Advanced Econometrics course, covering OLS and related concepts. The presentation details the classical linear regression model (CLRM) assumptions. The notes offer a good overview of the topic.
Full Transcript
ADEC-3070: Advanced Econometrics Dr. Manini Ojha JSGP Elective Fall, 2024 Dr. Manini Ojha ADEC-3070 Fall, 2024 1 / 137 Notes Lectures are not designed to lead to prociency in specific models, but ra...
ADEC-3070: Advanced Econometrics Dr. Manini Ojha JSGP Elective Fall, 2024 Dr. Manini Ojha ADEC-3070 Fall, 2024 1 / 137 Notes Lectures are not designed to lead to prociency in specific models, but rather expose you to as many models as possible I Most important thing to take from the course is that there is no estimator that is guaranteed to yield consistent estimates I Every estimator relies on some set of assumptions I As such, the validity of any set of estimates rests on the validity of the underlying assumptions I Knowing these assumptions is crucial for conducting and assessing applied work I Even if the assumptions hold in the population data-generating process, they may no longer hold given the data-collection process (e.g., measurement error, sample selection, etc.) Dr. Manini Ojha ADEC-3070 Fall, 2024 2 / 137 Recap: OLS Classical Linear Regression Model (CLRM) I ‘true’ relationship yi = α + βxi + εi , i = 1,..., N where I α, β = population parameters I α̂, β̂ = parameter estimates I εi = idiosyncratic error term (reflects randomness, unobserved factors) I ε̂i = estimated residual Ex: Jog your memory to state the assumptions! Dr. Manini Ojha ADEC-3070 Fall, 2024 3 / 137 Assumptions (A1) Linearity (of parameters): true relationship given by yi = α + βxi + εi , i = 1,..., N where I α, β = population parameters I εi = disturbance or error term (reflects randomness, unobserved factors) (A2) E [εi ] = 0 −→ E [yi |xi ] = α + βxi I Error is mean zero Dr. Manini Ojha ADEC-3070 Fall, 2024 4 / 137 (A3) E [ε2i ] = σ 2 I Equal to variance of ε I Variance identical ∀i (A4) E [εi εj ] = 0 i 6= j I Covariance between errors of any two observations (A5) E [εi |xi ] = 0 → E [xi εi ] = E [εi ] = 0 I x is independent of the error (A6) εi ∼ N(0, σ 2 ) I Errors distributed normally I Not needed for unbiasedness or consistency, only inference, hypothesis testing Dr. Manini Ojha ADEC-3070 Fall, 2024 5 / 137 OLS estimation Estimation I given a random sample {yi , xi }N i=1 , OLS minimizes the sum of the squared residuals N X α̂, β̂ = argmin (yi − α̂ − β̂xi )2 α,β i=1 I Ex: Find the solution! Dr. Manini Ojha ADEC-3070 Fall, 2024 6 / 137 OLS solution Solution implies I β̂OLS and α̂OLS Cov (yi xi ) β̂OLS = Var (xi ) PN (yi − ȳ )(xi − x̄) β̂OLS = i=1PN 2 i=1 (xi − x̄) PN yi (xi − x̄) β̂OLS = Pi=1 N 2 i=1 (xi − x̄) α̂OLS = ȳ − β̂OLS x̄ Dr. Manini Ojha ADEC-3070 Fall, 2024 7 / 137 Properties Properties I α̂, β̂ are unbiased (finite sample property), consistent (asymptotic property) I α̂, β̂ are efficient PN F Var (β̂) = σ 2 / i=1 (xi − x̄)2 = σ 2 /[Var (x)] F smallest variance of any linear, unbiased estimate (Gauss Markov Theorem) Dr. Manini Ojha ADEC-3070 Fall, 2024 8 / 137 Let β̂j be the OLS estimator of βj for some j Unbiasedness: I For each random sample of size N, β̂j has a probability dbn I Now, because β̂j is unbiased under some CLRM assumptions, this dbn has mean value βj Consistency: I As the sample size grows, the distribution of β̂j becomes more and more tightly distributed around βj F i.e. n−→ ∞, distribution of β̂j −→βj I We can make our estimator arbitrarily close to βj if we can collect as much data as we want Dr. Manini Ojha ADEC-3070 Fall, 2024 9 / 137 Formally, we write: plim β̂j −→ βj P (xi − x̄)yi β̂1 = P (xi − x̄)2 P (xi − x̄)εi β̂1 = β1 + P (xi − x̄)2 n−1 (xi − x̄)εi P β̂1 = β1 + −1 P n (xi − x̄)2 With law of large numbers, we get that as n−→ ∞, distribution of β̂j −→βj Dr. Manini Ojha ADEC-3070 Fall, 2024 10 / 137 Dr. Manini Ojha ADEC-3070 Fall, 2024 11 / 137 Consistency (asymptotics) Unbiasedness of an estimator (although important) cannot always be obtained Virtually all economists agree that I consistency is a minimal requirement for an estimator I “If you can’t get it right as n goes to infinity, you shouldn’t be in this business.” - Nobel Laureate Clive W. J. Granger Dr. Manini Ojha ADEC-3070 Fall, 2024 12 / 137 Multiple Regression Model ‘True’ relationship yi = xi β + εi , i = 1,....N where I K = # of independent variables (also called regressors or covariates) I Stacking observations y = xβ + ε F where y is N × 1 ; x is N × (K + 1); β is (K + 1) × 1 ; ε is N × 1 F xβ is N × 1 Dr. Manini Ojha ADEC-3070 Fall, 2024 13 / 137 Multiple Regression Model y =Xβ + ε l y1 1 x11.. x1K β0 ε1 . 1 x21.. x2K . . . =..... . +. . ..... . . yN 1 xN1.. xNK βK εN Dr. Manini Ojha ADEC-3070 Fall, 2024 14 / 137 Multiple Regression Model Assumptions: I E [xik εi ] = 0 for all k I x’s are linearly independent (no perfect multicollinearity) I Other assumptions follow from earlier Dr. Manini Ojha ADEC-3070 Fall, 2024 15 / 137 OLS estimation Estimation I given a random sample {yi , xi }N i=1 , OLS minimizes the sum of the squared residuals argminε0 ε = argmin(y − Xβ)0 (y − Xβ) β̂ β̂ I solution implies β̂OLS = (X0 X)−1 (X0 y) (see proof: [JW] Appdx D & E) I estimators retain same properties as in CLRM Dr. Manini Ojha ADEC-3070 Fall, 2024 16 / 137 Maximum Likelihood Estimation (MLE) Alternative estimation technique to OLS equivalent to OLS in classical linear regression model useful in nonlinear models Dr. Manini Ojha ADEC-3070 Fall, 2024 17 / 137 Intuition: I Outcome y depends on x, θ (e.g. θ = {β, σ})) I Estimation chooses θ̂ML to maximize probability of the realized data {yi , xi }N i=1 I MLE method chooses the values of the parameters to maximize the probability of drawing the data actually observed I MLEs are the parameter values “most likely” to have produced the data To get the ML estimators, we need an expression for the likelihood function To get the likelihood function, we need the joint probability distribution of the data Dr. Manini Ojha ADEC-3070 Fall, 2024 18 / 137 The likelihood function L(θ) gives total probability of observing the realized data as a function of θ General setup: I The pdf of a random variable y is f (y |θ) where θ captures parameters of the dbn I Given a sample of size N, the joint dbn is f (y1 ,......, yN |θ) I Assuming independence between observations, the joint dbn is basically the product of the marginal dbns N Y f (y1 ,....., yN |θ) = f (yi |θ) i=1 I The joint density is the likelihood function N Y L(θ|y) = f (yi |θ) i=1 F in short L(θ) Dr.IManini Ojha ADEC-3070 Fall, 2024 19 / 137 Notes: I Our goal is to infer something about the parameters from the realized data I Usually easier to work with log likelihood functions N Y L(θ) = f (yi |θ) i=1 N X ln[L(θ)] = ln[f (yi |θ)] i=1 I which is a just a monotonic transformation ML estimates are obtained by maximizing the likelihood function Dr. Manini Ojha ADEC-3070 Fall, 2024 20 / 137 General Model iid yi = f (xi , β) + εi , εi ∼ N(0, σ 2 ) θ = {β, α} data = {yi , xi }N i=1 implies L(θ) =Pr(y1 ,......, yN |x1 ,....., xN , θ) L(θ) =Pr(y1 |x1 , θ).......Pr(yN |xN , θ) Y L(θ) = Pr(yi |xi , θ) i=1 Dr. Manini Ojha ADEC-3070 Fall, 2024 21 / 137 Taking logs on both sides implies X ln[L(θ)] = ln[Pr(yi |xi , θ)] i and θ̂ML = argmax ln[L(θ)] θ which entails solving the likelihood equation ∂ln[L(θ)] =0 ∂θ Dr. Manini Ojha ADEC-3070 Fall, 2024 22 / 137 Example CLRM iid yi = xi β + εi , εi ∼ N(0, σ 2 ) What is the probability Pr(yi |xi , θ)? Pr(yi |xi , θ) = Pr(εi |xi , θ) Pr(yi |xi , θ) = Pr(yi − xi β|xi , θ) ( ) 1 yi − xi β 2 1 Pr(yi |xi , θ) = exp − 2πσ 2 2 σ | {z } = NORMAL PDF Dr. Manini Ojha ADEC-3070 Fall, 2024 23 / 137 this implies N " ( )# yi − xi β 2 X 1 1 θ̂ML = argmax ln exp − θ 2πσ 2 2 σ i=1 N ( ) N 1 X yi − xi β 2 θ̂ML = argmax − ln(2π) − Nlnσ − θ 2 2 σ i=1 which is maximized by minimizing the sum of squared residuals (thus, identical to OLS) Dr. Manini Ojha ADEC-3070 Fall, 2024 24 / 137 Properties MLE Consistent plimθ̂ML = θ Asymptotically normal Asymptotically efficient Dr. Manini Ojha ADEC-3070 Fall, 2024 25 / 137 Hypothesis Testing - MLE Wald tests I equivalent to F-test in OLS I requires estimation of only the unrestricted model Likelihood ratio test I Estimate both restricted and unrestricted model I Compute the likelihood fn in both the restricted and the unrestricted model, LR & LUR I Intuition: F Since MLE is based on maximizing L(θ) , imposing restrictions/dropping variables generally leads to a smaller L(θ) F The amount by which, imposing the restriction(s) lowers the likelihood fn, provides some indication of the validity of the restriction F The bigger the decrease, the more likely the restriction(s) do not hold Dr. Manini Ojha ADEC-3070 Fall, 2024 26 / 137 Likelihood ratio test (contd..) I Test statistic LR = 2[LUR − LR ] LR ≥ 0 where LR ∼ χ2q I q : # of restrictions Maximization typically by numerical methods as analytical derivatives are messy Dr. Manini Ojha ADEC-3070 Fall, 2024 27 / 137 Limited Dependent Variable Models (LDV) Class of models where y depends on x, but y is not continuous Common cases : I y ∈ {0, 1} : Binary model F estimation: probit, logit, LPM I y ∈ [a, b] : Censored model F a, b are known F a, b may vary by i F if a = −∞, b = ∞, then y is continuous F estimation: censored regression, tobit Dr. Manini Ojha ADEC-3070 Fall, 2024 28 / 137 Common cases (contd..) I y ∈ {0, 1, 2,..}: Count model F estimation: poisson, negative binomial I y ∈ {0, 1, 2,...}: Qualitative response (QR) model F values correspond to choices with no natural ordering F example: brand choice, mode of transportation F estimation: multinomial logit, multinomial probit, conditional logit, nested logit Dr. Manini Ojha ADEC-3070 Fall, 2024 29 / 137 Common cases (contd..) I y ∈ {0, 1, 2,...}: Ordered QR model F values correspond to choices with natural ordering F “distance” between values may vary between choices F example : College F estimation: ordered logit, ordered probit Dr. Manini Ojha ADEC-3070 Fall, 2024 30 / 137 Binary models Applicable to problems where the dependent variable is binary Examples: I Labour force participaton (LFP) I Default on a loan I Belong to a FTA etc. Dr. Manini Ojha ADEC-3070 Fall, 2024 31 / 137 Linear Probability Model setup(LPM) yi = xi β + εi εi ∼ N(0, σi2 ) I estimated by OLS I problems F heteroskedasticity since −xi β if yi = 0 εi = 1 − xi β if yi = 1 F predictions are not bounded by 0,1 xi β̂ ∈ / [0, 1] and therefore do not correspond to probabilities Dr. Manini Ojha ADEC-3070 Fall, 2024 32 / 137 Solution: model Pr (yi = 1|xi ) using a “proper” functional form Pr (yi = 1|xi ) = F (xi β) where F (·) satisfies lim F (xi β) −→ 1 xi β−→∞ lim F (xi β) −→ 0 xi β−→−∞ obvious candidates are CDFs since these map numbers from the entire real number line to the unit interval Dr. Manini Ojha ADEC-3070 Fall, 2024 33 / 137 two parametric solutions: I Probit model or Logit model ˆxi β F (·) = Φ(·) = φ(u)du (probit) −∞ exp(xi β) F (·) = Λ(·) = (logit) 1 + exp(xi β) Dr. Manini Ojha ADEC-3070 Fall, 2024 34 / 137 Interpretation of β ∂Pr (yi = 1) ∂F (xi β) = βj ∂xj ∂xj | {z } Marginal Effect where ∂F (xi β) φ(xi β) (probit) = ∂xj Λ(x β)[1 − Λ(x β)] (logit) i i Dr. Manini Ojha ADEC-3070 Fall, 2024 35 / 137 Notes I since βj is more difficult to interpret, we typically report marginal effects I marginal effects are observation specific Common reporting options: 1 marginal effects evaluated at the sample mean 2 sample mean of the marginal effects 3 marginal effects evaluated at some combination of values of interests Dr. Manini Ojha ADEC-3070 Fall, 2024 36 / 137 Latent variable framework - probit/logit Probit/logit can be recast in a latent (unobserved) variable framework I quality of life/happiness Model yi∗ = xi β + εi 1 if y ∗ > 0 i yi = 0 if y ∗ ≤ 0 i yi∗ is unobserved yi is observed I yi (indicator fn) takes on the value 1 if the event is true I yi (indicator fn) takes on the value 0 if the event is not true Given data {yi , xi }N i=1 , estimate β via MLE Dr. Manini Ojha ADEC-3070 Fall, 2024 37 / 137 STATA commands: I -probit- I -logit- I -dprobit- I -margin- I -mfx- Dr. Manini Ojha ADEC-3070 Fall, 2024 38 / 137 Censored regression models Applicable to problems where the dependent variable is censored (potentially from above and below) at certain thresholds I Right data censoring: top coding I Note: while the dependent variable is censored, the x 0 s are always observed Examples: I income/wealth may be top-coded F Respondents are asked their wealth but the option says “more than $500,000” I age at first birth for women Dr. Manini Ojha ADEC-3070 Fall, 2024 39 / 137 Latent variable framework - censored models Model setup iid yi∗ = xi β + εi , εi ∼ N(0, σ 2 ) b if yi∗ ≥ bi i yi = yi∗ if yi∗ ∈ [ai , bi ] if yi∗ ≤ ai a i yi∗ is unobserved yi is observed Terminology I Right-censoring: if bi 6= ∞ ∀i I Left-censoring: if ai 6= −∞ ∀i Dr. Manini Ojha ADEC-3070 Fall, 2024 40 / 137 Given data {yi , xi }N i=1 , estimate β via MLE Interpretation of β is the impact of a 4x on y ∗ (the latent variable) σ is identified as long as yi ∈ [ai , bi ] for some i Dr. Manini Ojha ADEC-3070 Fall, 2024 41 / 137 Special case: Tobit model ai = 0, bi = ∞ ∀i Examples: I Labour supply, I R&D expenditures by a firm I Aptitude test scores with lower bound 0 Implies the following Tobit setup: iid yi∗ = xi β + εi , εi ∼ N(0, σ 2 ) y ∗ if y ∗ > 0 i i yi = 0 if y ∗ ≤ 0 i yi∗ is unobserved yi is observed Dr. Manini Ojha ADEC-3070 Fall, 2024 42 / 137 Given data {yi , xi }N i=1 , estimate via MLE I Interpretation of β is the impact of a 4x on y ∗ (the latent variable) I not directly comparable to OLS which estimates the change in ∂E [yi |xi ]/∂xi I marginal effects (for comparison with OLS) are observation specific, ∂E [yi |xi ] ∂xj = βj Φ xσi β STATA commands: I -tobit- I -cnreg- Dr. Manini Ojha ADEC-3070 Fall, 2024 43 / 137 Count Models Applicable to situations where the dependent variable is a non-negative integer count of events I Dependent variable typically takes on only a few values Examples: I # of children, I # of patents held by a firm, I # of doctor visits per year, I # of cigarettes smoked per day Dr. Manini Ojha ADEC-3070 Fall, 2024 44 / 137 Poisson count model setup I Want to model expected number of events conditional on x E [yi |xi ] = F (xi β) I where F (·) ≥ 0 Estimate I via MLE I need a distribution for yi which cannot be normal I assume Poisson dbn, which depends only on the mean given by exp{xi β} Dr. Manini Ojha ADEC-3070 Fall, 2024 45 / 137 Form of F (xi β) is: exp{−λi }λyi i F (xi β) = yi ! where λi = exp{xi β} Interpretation of β I as if ln(y ) is the dependent variable I implies %4, or elasticity if ln(x) Marginal effects ∂E [yi |xi ] = λi β ∂xi Dr. Manini Ojha ADEC-3070 Fall, 2024 46 / 137 Alternative estimation : negative binomial STATA commands: I -poisson- I -nbreg- Dr. Manini Ojha ADEC-3070 Fall, 2024 47 / 137 Zero inflated poisson model Applicable to count models with a mass at zero, or where the decision between zero and some positive count is different than the decision among positive counts I Example: model # children ever born to a woman or number of arrests I Note: F observations assumed to be drawn from two regimes (zero and positive) F observations in regime 1 always have a count of zero F outcomes for observations in regime 2 follow a Poisson process, with possible outcomes given by yi > 0 STATA commands: I -zip- I -zinb- Dr. Manini Ojha ADEC-3070 Fall, 2024 48 / 137 Application: Talley et al. (2005) {Framework for presentations} Introduction I Question: determinants of crew injuries in ship accidents I Data structure F Sample includes accidents investigated by US Coast Guard F Includes US ships anywhere, and foreign ships in US waters F Outcomes: # deaths, # injuries, # missing, F Small, non-negative integer count data F Lots of covariates; mix of discrete and continuous F Data over 11 years, 1991-2001 Dr. Manini Ojha ADEC-3070 Fall, 2024 49 / 137 Model I Count data =⇒ Poisson, negative binomial I Separate model by dependent variable and type of ship (freight, tanker, tugboat) Results I Interpreted as % change I Results also report marginal effects Shortcoming I ? Dr. Manini Ojha ADEC-3070 Fall, 2024 50 / 137 Qualitative response models Applicable to analyses of choice by agents among many (unordered) alternatives Examples: I brand choice I mode of transportation I type of mortgage (fixed-30 yr, fixed-15 yr, adjustable rate,...) I type of school (public, private, government, religious, non-religious...) Dependent variable typically coded as (positive) integers corresponding to specific choice; value/order of actual numbers is irrelevant Dr. Manini Ojha ADEC-3070 Fall, 2024 51 / 137 Multinomial logit setup I choose among J + 1 alternatives I yi ∈ {0, 1,....., J} Pr (yi = j|xi , θ) = F (xi , θ) I Functional form of F (·) exp{xi βj } Pr (yi = j|xi , θ) = PJ k=0 exp{xi βk } is known as multinomial logit Dr. Manini Ojha ADEC-3070 Fall, 2024 52 / 137 Note: β 0 s are choice specific (subscripted by j) I Interpretation F log odds ratio (relative to the base choice) Pij ln = xi βj Pi0 where Pij = Pr (yi = j) I Implies βj is the % change in odds ratio from unit change in x I log odds ratio (relative to any other choice) Pij ln = xi (βj − βj 0 ) Pij 0 Dr. Manini Ojha ADEC-3070 Fall, 2024 53 / 137 Alternative estimation method: multinomial probit (more complex), nested logit STATA commands: I -mlogit- I -mprobit- Dr. Manini Ojha ADEC-3070 Fall, 2024 54 / 137 Ordered response models Applicable to analyses of choice by agents among many ordered alternatives Example: I labor force status (OLF, PT, FT) I schooling (