Quantitative Methods Lecture 4 2024-2025 PDF

Document Details

Uploaded by Deleted User

Universiteit Leiden

2024

Dr. Brendan Carroll

Tags

quantitative methods regression analysis econometrics statistics

Summary

This lecture covers quantitative methods, specifically regression analysis and assumptions of OLS (Ordinary Least Squares). It's a 2024-2025 lecture from Leiden University.

Full Transcript

Quantitative Methods Fourth meeting: Properties of estimates; Assumptions of OLS Part I Dr. Brendan Carroll Leiden University. The university to discover. Recap of Last Week - Hypothesis testing in regression - Interpretation of regression results - Variables in regression...

Quantitative Methods Fourth meeting: Properties of estimates; Assumptions of OLS Part I Dr. Brendan Carroll Leiden University. The university to discover. Recap of Last Week - Hypothesis testing in regression - Interpretation of regression results - Variables in regression including dummy (binary/dichotomous) variables - Multivariate regression – Why? What? How? Leiden University. The university to discover. This Week - Models for predicting, models for inference - Properties of estimators and relationship to sampling distribution of b - The assumptions of OLS and the Gauss- Markov theorem - Assumption 1: Linear model - Assumption 2: (Conditional) mean independence – Part I Leiden University. The university to discover. Why Regression? - To predict the dependent variable - easier - To make causal inferences – more difficult - These are not mutually exclusive – we can do both, but then we need to meet criteria for each Leiden University. The university to discover. Why Regression? - I Regression to predict the dependent variable - We don’t care that correlation ≠ causation - We just want to choose Xs to minimize our errors of prediction - In other words – we want to maximize R-squared - No assumptions needed - Examples: predicting stocks, financial markets, economic indicators, election outcomes Yi = a + b1 X 1i + b 2 X 2i +... + bkXki + ui Leiden University. The university to discover. Why Regression? - II Regression to make causal inferences - We want to know if X causes Y - We want each b to be a good estimate of β, the true effect of X on Y - We must satisfy certain assumptions to make good estimates - We don’t care about minimizing prediction errors - Examples: social science, public policy making, epidemiology Yi = a + b1 X 1i + b 2 X 2i +... + bkXki + ui Leiden University. The university to discover. Properties of Estimators - b is our estimate of β - How good an estimate is b? - We want our estimate to be unbiased - b=β - We want our estimate to be efficient - Minimum Sb or best - Sb is the standard error of the estimate - Which assumption is more important? Leiden University. The university to discover. Sampling Distribution - Recall that b1=some value (obtained from one sample) is our estimate of β1 - b1* = some other value for some other sample - Because we could collect different samples of size n, b1 (our estimator) is itself a variable with a distribution - The distribution of our estimator is the sampling distribution Leiden University. The university to discover. Properties of Sampling Distribution - Any value of b1 is possible, but values close to β1 are more likely - On average, b1 = β1 (provided assumptions are met) - That is, the estimator is unbiased - But there is error (the standard error of the estimate), which can be estimated; in the bivariate case: Sb =  (Yi − Yˆ ) 2 /( n − 2) i (Xi − X ) 2 Leiden University. The university to discover. Standard Error of the Estimate - In trivariate case: S b1 =  i i (Y − Yˆ )2 S b2 =  (Y − Yˆ ) i i 2  1i 1 ( X − X ) 2 (1 − r 2 X 1 X 2 )( n − 3)  2i 2 ( X − X ) 2 (1 − r 2 X 1 X 2 )( n − 3) - General multivariate case: S b1 =  (Y − Yˆ ) i i 2 (X 1i − X 1 ) 2 (1 − R 2 i )( n − k − 1) Leiden University. The university to discover. Gauss-Markov Theorem (p. 122-3) - OLS gives best, unbiased estimates if - (OLS is BestLinearUnbiasedEstimate BLUE if) - 1: Linear function of Xs plus disturbances - 2: (Conditional) mean independence (disturbances have mean zero) - 3: Homoskedasticity - 4: Uncorrelated disturbances - 5: Disturbances are normally distributed Leiden University. The university to discover. The Value of these Assumptions - Assumptions 1 and 2 guarantee unbiasedness - Assumptions 1, 2, 3 and 4 guarantee efficiency - Assumption 5 enables accurate hypothesis tests at (relatively) small sample sizes - Note 1: a violation of assumption 1 or 2 may also affect efficiency, but if our estimates are biased, who cares about efficiency? - Note 2: if we reject the null hypothesis, inefficiency does not matter Leiden University. The university to discover. Linearity Assumption - I The dependent variable Y is a linear function of the Xs plus a random error or disturbance Yi = a + b1 X 1i + b 2 X 2i +... + bkXki + ui - What does linearity imply about the relationship between X and Y? - That the effect of X on Y is constant over the full range of X Leiden University. The university to discover. Linearity Assumption - II Leiden University. The university to discover. Linearity Assumption - III - In reality, the assumption is not perfectly met - In many cases, we can use OLS to estimate non-linear relationships (and remain BLUE) by transforming the variables (next week) - Logarithmic transformation - Quadratic transformation - Interaction of two or more Xs Leiden University. The university to discover. Linearity Assumption - IV - If nonlinearity cannot be accommodated into OLS through transformation, we have many nonlinear estimation techniques - Logit and probit for dichotomous Y - Poisson and negative binomial for count Y - Multilevel models for Y at one level and Xs at multiple levels Leiden University. The university to discover. Mean independence – Part I Leiden University. The university to discover. (Conditional )Mean independence The mean of u – the error/residuals (conditional on X) equals zero (and thus does not depend on Xs). This is the most important assumption to ensure that OLS is not biased!!! Yi = a + b1 X 1i + b 2 X 2i +... + bkXki + ui Leiden University. The university to discover. (Random) Error/Residual u 12 Y 10 8 6 Y 4 2 0 0 2 4 6 8 10 12 X Leiden University. The university to discover. Violations of Mean Independence - Measurement error in the independent variables - Reverse causation - Specification error – omission of relevant variables - Specification error – wrong functional form (next week) Leiden University. The university to discover. Measurement Error - Systematic measurement error - Random measurement error - In the dependent variable - In the independent variable(s) Leiden University. The university to discover. Systematic Measurement Error (I) - Measuring something in addition to, instead of, or as an incomplete part of the true concept of interest - Because relative to true concept of interest, depends on first having a good definition - Always results in bias, inefficiency, and nonsense - Can only be dealt with during research design Leiden University. The university to discover. Systematic Measurement Error (II) - Example: GDP as a measure of national wealth - GDP measures only the monetary value of goods and services produced in a country - Values destruction of ecosystems that generate short-term revenues, undervalues unpaid ‘household’ and other work - Example: survey design, measuring feminism, and surveyor-induced measurement error - “Should men and women get equal pay for equal work?” - Measures the extent to which social pressure induces individuals to answer questions in a certain way Leiden University. The university to discover. Random Measurement Error - For a particular observation, the observed value differs from the true value - Call this difference “error” - The errors are random (that is, for some observations they are bigger, others they are smaller, and still others there are no errors at all and these differences are unpredictable) - Faulty measuring tool, carelessness, rounding Leiden University. The university to discover. In the Dependent Variable (I) Y = a + b1 X 1 + b2 X 2 +... + bkXk + ui - Where u is the difference between predicted Y and actual Y for each observation - Now let e be the error in measurement of Y for a particular observation; Y is true but only Y* is observed Y * = Y + ei - By using Y* we are really regressing Y * = (a + b1 X 1 + b2 X 2 +... + bkXk + ui ) + ei Leiden University. The university to discover. In the Dependent Variable (II) - Provided that the measurement error is truly random, then ui + ei is indistinguishable from ui - In other words, the partial slope coefficients remain unbiased - Because standard errors are based on residuals, the standard errors are elevated in the presence of random measurement error in the DV (Yi − Yˆi ) 2 - Efficiency affected Sb1 = ( X 1i − X 1 ) 2 (1 − r 2 X1 X 2 )(n − 3) Leiden University. The university to discover. In an Independent Variable - In the bivariate case, measurement error in IV leads to underestimation (closer to zero) of slope coefficient and loss of efficiency - In multivariate case, measurement error in IV becomes much less predictable - Always biased and inefficient - Whether over- or underestimation depends on degree of measurement error and on correlations among independent variables in complex ways Leiden University. The university to discover. Random Measurement Error in Short - In DV: unbiased but inefficient - In IV: - For bivariate regression, slope coefficient will be too small (bias); inefficiency - For multivariate regression, slope coefficients are biased (unpredictably); inefficiency To fix, you need better data! Leiden University. The university to discover. Reverse Causation - You think X causes Y, but what if Y causes X? - What if there is feedback between the two? - Reverse causation leads to biasedness - Example: the effect of public opinion on political agendas - Solution: very difficult. Theory, advanced methods, potentially unsolvable Leiden University. The university to discover. Specification Error – Wrong Variables - Including an irrelevant variable - inefficiency - Excluding a relevant variable - biasedness Leiden University. The university to discover. Including an Irrelevant Variable - Suppose X1 effects Y but X2 does not, yet we estimate Y = a + b1X1 + b2X2 - Then b1 = β1 and b2 = β2 (always on average) - But what about Sb1? Compare: (Yi − Yˆi ) 2 (Yi − Yˆi ) 2 Sb1 = Sb = ( X 1i − X 1 ) 2 (1 − r 2 X1 X 2 )(n − 3) ( X i − X ) 2 (n − 2) Leiden University. The university to discover. Consequences of Including an Irrelevant Variable - Partial slope coefficients remain unbiased - Estimates are inefficient - The greater the correlation between the included variables, the more inefficient - If they are uncorrelated, estimates are efficient Leiden University. The university to discover. Excluding a Relevant Variable - Suppose that X1 and X2 both effect Y but we estimate only Y = a + b1X1 - Now b1 = β1+ β2r2X1X2 - In other words, b1 is biased, but only to the extent that X1 and X2 are correlated Leiden University. The university to discover. Consequences of Excluding Relevant Variables - For biasedness, it depends - If X1 and X2 are positively correlated, then b 1 > β1 - If X1 and X2 are negatively correlated, then b1 < β1 - Efficiency is affected, but who cares? Leiden University. The university to discover. Excluding a relevant variable - Does the variable have a causal effect on the dependent variable? - Is the variable correlated with those variables whose effects are the focus of the study? If answer “yes” to both, then the excluding the variable leads to bias. Solution: add the variable as a “control” Leiden University. The university to discover. How to Detect and Deal with Wrong Variable Errors - Exclusion of relevant variables is a problem of theory - Inclusion of irrelevant variables can be diagnosed by the t-statistics (e.g., hypothesis testing) Leiden University. The university to discover. End of lecture Leiden University. The university to discover.

Use Quizgecko on...
Browser
Browser