ECON 266: Multivariate OLS

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

In a multivariate ordinary least squares (OLS) regression, under what specific condition would Stata automatically remove variables from the model?

  • When the partial correlation between two or more independent variables approaches unity, indicating perfect multicollinearity. (correct)
  • When the condition number of the design matrix exceeds a pre-defined threshold, indicating severe multicollinearity, and AIC suggests model simplification.
  • When the variance inflation factor (VIF) of a variable is excessively high, exceeding a critical value determined by a cross-validation procedure.
  • When the Bayesian Information Criterion (BIC) decreases after removing a set of highly correlated variables, as assessed by a stepwise regression.

Within an econometric model, consider the scenario where an independent variable, $X_i$, is measured with error such that $X_i = X_i^* + \nu_i$, where $X_i^$ is the true, unobserved value, and $\nu_i$ is a random error term with mean zero and uncorrelated with both $X_i^$ and the error term $\epsilon_i$. If one estimates a regression model, what is the consequence of this measurement error?

  • The estimator will remain unbiased but will have an inflated variance, reducing the precision of the estimate but not affecting consistency.
  • The estimator will be biased and inconsistent, with the bias amplifying the estimated coefficient away from zero.
  • The estimator will be consistent and unbiased, but the standard errors will be underestimated, leading to misleading statistical inference.
  • The estimator will be biased and inconsistent, with the bias attenuating the estimated coefficient towards zero. (correct)

If the true model is given by $Y_i = \beta_0 + \beta_1X_{1i} + \beta_2X_{2i} + \epsilon_i$, what is the interpretation of $R_j^2$ within the context of assessing multicollinearity?

  • $R_j^2$ quantifies the incremental increase in the overall $R^2$ when $X_{ji}$ is added to a regression model already containing all other independent variables.
  • $R_j^2$ represents the proportion of variance in the dependent variable $Y_i$ explained by the independent variables $X_{1i}$ and $X_{2i}$ in the full model.
  • $R_j^2$ represents the square of the partial correlation coefficient between $X_{1i}$ and $X_{2i}$, controlling for the effects of $Y_i$.
  • $R_j^2$ represents the $R^2$ obtained from an auxiliary regression where $X_{ji}$ is regressed on a constant and all remaining independent variables. (correct)

Consider an OLS regression where one of the independent variables, $X_1$, is a perfect linear combination of another independent variable, $X_2$ (i.e., $X_1 = a + bX_2$). What implication does this have for the estimation procedure and the properties of the OLS estimators?

<p>The OLS estimation procedure will fail due to perfect multicollinearity, rendering the coefficient estimates undefined, and statistical software will typically drop one of the variables. (A)</p> Signup and view all the answers

In the context of econometric modeling, what is the most accurate interpretation of the term 'attenuation bias'?

<p>It refers to the distortion of coefficient estimates towards zero, typically resulting from measurement error in the independent variable. (D)</p> Signup and view all the answers

Suppose the true model is $Y_i = \beta_1 X_i^* + \epsilon_i$, but we observe $X_i = X_i^* + v_i$. Given that $plim(\hat{\beta_1}) = \frac{cov(X, Y)}{var(X)}$, derive an expression for $plim(\hat{\beta_1})$ in terms of $var(X^*)$, $var(v)$, and $\beta_1$.

<p>$plim(\hat{\beta_1}) = \beta_1 \frac{var(X^<em>)}{var(X^</em>) + var(v)}$ (A)</p> Signup and view all the answers

In the context of estimating a multiple regression model, what is the effect of increasing the $R^2$ from the auxiliary regression of an independent variable $X_j$ on all other independent variables in the model on the variance of the estimated coefficient, $var(b_j)$?

<p>An increase in $R^2$ from the auxiliary regression will increase $var(b_j)$, decreasing the precision of the estimated coefficient. (A)</p> Signup and view all the answers

What methodological approach can be used to address multicollinearity? (Select all that apply)

<p>Variance Inflation Factor (VIF) thresholding: Remove variables exceeding a pre-specified VIF to reduce multicollinearity. (A), Principal Component Analysis (PCA): Transform the original variables into orthogonal principal components. (B), Ridge Regression: Apply L2 regularization to mitigate the impact of multicollinearity. (D)</p> Signup and view all the answers

Consider $Y_i = \beta_0 + \beta_1X_{1i} + \beta_2X_{2i} + \epsilon_i$, where $X_{1i}$ and $X_{2i}$ are highly correlated. What are the implications for hypothesis testing regarding $\beta_1$?

<p>The variance of the estimator for $\beta_1$ will increase, reducing the precision of the estimate and potentially leading to a failure to reject a false null hypothesis. (B)</p> Signup and view all the answers

What are the consequences of using a biased model versus having a low $R^2$?

<p>A biased model is more problematic for inference despite a high $R^2$ because coefficients may be wrong; a low $R^2$ model is bad but still useful for inference. (C)</p> Signup and view all the answers

If you are considering adding more independent variables, is this necessary a good thing? Explain.

<p>No, it depends: adding variables can only boost the explained variance, but it may introduce other effects if the data is somehow associated as well. (B)</p> Signup and view all the answers

Assume the true variance of estimated coefficient $b_1$ is $Var(b_1) = \frac{\sigma^2}{N var(X_i)}$, where $\sigma^2$ is the error variance, $N$ is the sample size, and $var(X_i)$ is the variance of the independent variable. What does this suggest when the model is multivariate?

<p>There is a similar expression which is $Var(b_j) = \frac{\sigma^2}{N var(X_j)(1-R_j^2)}$. As $R_j^2$ increases, $Var(b_j)$ increases. (C)</p> Signup and view all the answers

What does a precise estimated coefficient mean?

<p>The variance of the estimated coefficient is small and the confidence interval is narrow. (A)</p> Signup and view all the answers

The variance of sums of random variables can influence the coefficients in the regression. Evaluate:

<p>$Var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y)$. (B)</p> Signup and view all the answers

In the context of regression analysis, how could the rules about the covariance or variance of random variables be applied to deal with measurement errors?

<p>By applying these rules, an expression of the model in equation 1 and 2 can allow us to show the covariance. (D)</p> Signup and view all the answers

In an econometric study analyzing the determinants of educational attainment, researchers include years of schooling as one of the explanatory variables. However, due to limitations in data collection, they use household expenditure on education as a proxy for years of schooling. Discuss the potential consequences and econometric issues that may arise from using this proxy variable.

<p>Attenuation bias. (D)</p> Signup and view all the answers

Describe the implications of perfect multicollinearity and provide measures through the use of auxiliary equations.

<p>As R-squared = 1, can result in a linear relationship between two or more independent variables can produce perfect multicollinearity. (A)</p> Signup and view all the answers

In a multiple regression model, the adjusted R-squared is often used to assess the goodness of fit. How does the adjusted $R^2$ specifically address the limitations of the regular $R^2$ in model selection, and what are the implications of using the adjusted $R^2$ for comparing models with different numbers of predictors?

<p>The adjusted $R^2$ decreases as more noise variables get added into the estimation, even if the initial overall R-squared increases. (C)</p> Signup and view all the answers

In the context of multivariate regression analysis, how does the variance inflation factor (VIF) quantify the severity of multicollinearity, and what strategies can be employed to mitigate its impact on coefficient estimation and statistical inference?

<p>VIF measures the degree to which the variance of an estimated regression coefficient is increased due to multicollinearity and strategies include ridge regression or dropping variables from the model. (C)</p> Signup and view all the answers

Flashcards

Attenuation bias

Measurement error in explanatory variables leads to underestimation of the coefficients.

Assumptions about error ν

Mean zero and uncorrelated with the true independent variable and the error term.

Rules of Covariance

Rules about how variables change together.

Variance of sums of random variables

Describes how spread out the data is for the sum of multiple random variables.

Signup and view all the flashcards

Multicollinearity

Occurs when an independent variable is highly correlated with other independent variables in the model.

Signup and view all the flashcards

R-squared

A model's goodness of fit is measured by the R-squared value.

Signup and view all the flashcards

Adjusted R-squared

Adjusted R-squared lowers the value depending on how many variables are in the model.

Signup and view all the flashcards

Precise estimated coefficient

When the variance of an estimated coefficient is small.

Signup and view all the flashcards

Study Notes

ECON 266: Introduction to Econometrics

  • Presented by Promise Kamanga of Hamilton College on 03/04/2025

Multivariate OLS

From The Previous Class

  • The influence of measurement error on data analysis hinges on the variable affected.
  • The effect is generally less severe when the error is only in the dependent variable.
  • When error is in the explanatory variable, the end result is attenuation bias.
  • b1 will be systematically be less than β1 in magnitude.
  • The true model is Yi = β₁Xi* + εi
  • Instead of X*, we observe: Xi = Xi* + vi
  • ν has a mean of zero and is neither correlated with X* nor with ε
  • εi is uncorrelated with X*.
  • This means that the model we would estimate would be Yi = β₁Xi + εi
  • The estimated coefficient will have the usual form: b₁= Σ(Xi *Yi)/ Σ(Xi)^2
  • In the limit: plim b1 = cov(X,Y) / var(X)

Measurement Error in the Independent Variable

  • Goal: show that plim b1 = cov(X, Y) / var(X) < β1
  • The previous class ended by reviewing covariance and variance rules for random variables.

Rules of Covariance

  • Rules of covariance include those for:
    • Uncorrelated random variables
    • A random variable with itself
    • Linear combinations

Rules of Variance

  • Variance of sums of random variables is calculated with: var(x+y) = var(x) + var(y) + 2cov(x,y)
  • When x and y are independent: var(x+y) = var(x) + var(y)

Measurement Error in the Independent Variable (cont.)

  • Applying the rules and the expressions in Equation 1 and 2 enables demonstration of: plim b1 = cov(X, Y) / var(X) < β1
  • Yi = β₁Xi + εi
  • Xi = Xi* + vi
  • Plim b1 = cov((Xi* + υi , β₁ Xi* + εi )) / var ((Xi* + υi )) and simplified:
  • = β₁ cov(Xi* Xi*) + β₁cov (υi εi )) / var (Xi*) + var (υi ) + 2 cov (Xi* υi )
  • Plim b1 = β₁ var (Xi*) / var (Xi) + var (υi ) < β₁ in magnitude
  • With independent variables, bias is present

Precision of Estimated Coefficients

  • An estimated coefficient is precise when its variance is small.
  • The expression that describes precision for a bivariate model: Var(b1) =σ2/ NvarXi
  • This expression is valid when the errors are homoskedastic and uncorrelated.
  • Precision is key for hypothesis tests and confidence intervals.
  • The precision expression in multivariate OLS resembles that of the bivariate case: Var(bj) = σ2/ NvarXi(1 -R2j)
  • The new element in the former equation is (1- R²j)
  • R²j is the R² from an auxiliary regression where Xj is the dependent variable

Precision of Estimated Coefficients (cont.)

  • R²∈[0,1] is a goodness-of-fit measure for a model.
  • R² measures the squared correlation of fitted and actual values.
  • Suppose our model is Y; = β0 + β1X1i + β2X2i + β3X3i + εi
  • This model has 3 auxiliary regressions (each independent variable).
  • if X1i = X2i, then R² = 1.
  • If R^2 increases, the value of var(bj) is also high.
  • When an independent variable strongly correlates with others in a model, it is termed multicollinearity.
  • It becomes difficult to isolate how much Xj affects Y.
  • One should avoid adding irrelevant variables to a model.
  • Solutions for multicollinearity involve:
    • Large sample sizes allow reasonable inferences despite it
    • Acknowledging it if substantial
    • If there is perfectly multicollinearity, Stata will drop variables until the issue is eliminated from the model, also R² = 1.

Goodness of Fit in Multivariate OLS

  • The goodness of fit is measured by R² which is different from R²j.
  • The value of R² is not always useful.
    • A good model can show a low R², and A bad model can show a high R².
    • More independent variables necessarily make the R² increase.
  • The adjusted R² lowers the value of the R² depending on how many variables are in the model.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Multivariate Normal Distributions Quiz
3 questions
Multivariate Analysis Quiz
10 questions
Econometrics: Multivariate OLS
22 questions

Econometrics: Multivariate OLS

TransparentMusicalSaw1414 avatar
TransparentMusicalSaw1414
ECON 266: Multivariate Ordinary Least Squares (OLS)
19 questions
Use Quizgecko on...
Browser
Browser