Key Assumptions of Linear Regression
29 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does the linearity assumption in linear regression imply about the relationship between the independent and dependent variables?

  • The relationship can vary dramatically without affecting predictions.
  • The relationship is always exponential.
  • The independent variable is unrelated to the dependent variable.
  • The relationship must be linear. (correct)

Which of the following is NOT a key assumption of linear regression?

  • Infinite multicollinearity. (correct)
  • Homoscedasticity of residuals.
  • Absence of endogeneity.
  • Independence of errors.

What is the consequence of violating the linearity assumption in linear regression?

  • Increased bias in predictions. (correct)
  • The model will definitely capture all patterns.
  • Residuals will remain constant.
  • Enhanced model performance.

How can one visually assess the linearity assumption in a dataset?

<p>By employing scatter plots or residual plots. (A)</p> Signup and view all the answers

What term describes the assumption that the residuals of a linear regression model are evenly distributed across all levels of the predicted value?

<p>Homoscedasticity. (D)</p> Signup and view all the answers

In the context of linear regression, what does 'absence of endogeneity' refer to?

<p>There is correlation between predictor variables and the error term. (A)</p> Signup and view all the answers

Which of the following scenarios illustrates a non-linear relationship that might violate linearity assumptions?

<p>Sales that increase significantly more at higher temperatures compared to lower temperatures. (C)</p> Signup and view all the answers

Which assumption ensures that one variable is not a linear combination of other variables in linear regression?

<p>Lack of multicollinearity. (A)</p> Signup and view all the answers

What does a Q-Q plot indicate about the residuals when they fall along a straight line?

<p>The residuals are normally distributed. (D)</p> Signup and view all the answers

What does a Variance Inflation Factor (VIF) value greater than 10 suggest?

<p>There is significant multicollinearity. (C)</p> Signup and view all the answers

Which statistical test is used to assess autocorrelation in residuals?

<p>Durbin-Watson test (D)</p> Signup and view all the answers

What is a common action taken when homoscedasticity is violated?

<p>Applying transformations to the dependent variable. (B)</p> Signup and view all the answers

What is the primary purpose of using regularization techniques like Ridge or Lasso regression?

<p>To handle multicollinearity and improve model performance. (C)</p> Signup and view all the answers

Which assumption is NOT critical for ensuring reliable results in linear regression?

<p>Presence of outliers. (C)</p> Signup and view all the answers

What does the Durbin-Watson statistic close to 2 imply?

<p>No autocorrelation. (A)</p> Signup and view all the answers

What approach can be used when the residuals are heteroscedastic or correlated?

<p>Apply Generalized Least Squares (GLS). (B)</p> Signup and view all the answers

What does homoscedasticity imply about the residuals in a linear regression model?

<p>Residuals maintain a constant variance across predictor levels. (B)</p> Signup and view all the answers

What effect does heteroscedasticity have on regression coefficient estimates?

<p>Makes them less accurate than they should be. (D)</p> Signup and view all the answers

Which scenario indicates a violation of the independence of errors assumption?

<p>Errors from one observation influence the next. (B)</p> Signup and view all the answers

What is the consequence of multicollinearity in a regression model?

<p>It inflates the standard errors of the coefficients. (A)</p> Signup and view all the answers

In the context of linear regression, what does the assumption of no endogeneity ensure?

<p>Independent variables do not correlate with the error term. (B)</p> Signup and view all the answers

How can one identify multicollinearity among independent variables?

<p>Using scatter plots or heatmaps. (B)</p> Signup and view all the answers

What visual tool can be useful to diagnose homoscedasticity in regression analysis?

<p>Residuals vs. fitted values plot. (C)</p> Signup and view all the answers

What can be inferred if a dataset's residuals plot shows a clear pattern?

<p>The assumption of homoscedasticity may be violated. (A)</p> Signup and view all the answers

Why is multivariate normality important in linear regression?

<p>It ensures the validity of hypothesis tests and confidence intervals. (B)</p> Signup and view all the answers

What indicates the presence of autocorrelation in a residuals plot over time?

<p>Significant spikes in the ACF plot. (D)</p> Signup and view all the answers

What is an example of a situation that may lead to heteroscedasticity?

<p>Increasing variance in residuals as the predicted values increase. (C)</p> Signup and view all the answers

What should be done if the assumption of multivariate normality is violated?

<p>Fit a non-linear model. (B)</p> Signup and view all the answers

When independent variables show a strong relationship, what is this phenomenon called?

<p>Multicollinearity. (A)</p> Signup and view all the answers

Flashcards

Linearity Assumption

The relationship between predictor and response variables in linear regression must be linear.

Homoscedasticity

Residuals have constant variance across the range of predictor values.

Multivariate Normality

Errors in linear regression are normally distributed.

Independence of Errors

Errors in regression are independent of each other.

Signup and view all the flashcards

Multicollinearity

Independent variables should not be highly correlated.

Signup and view all the flashcards

No Endogeneity

Predictors should not be caused by the response variable.

Signup and view all the flashcards

Linear Regression

Predictive analysis algorithm to predict continuous target variable from predictor variables.

Signup and view all the flashcards

Predictor variable

Variables used to predict the target variable in a regression analysis.

Signup and view all the flashcards

Q-Q Plot

A graph used to assess if a dataset's distribution is normal.

Signup and view all the flashcards

VIF

Variance Inflation Factor, a measure of multicollinearity.

Signup and view all the flashcards

Autocorrelation

Correlation between residuals in a time series model.

Signup and view all the flashcards

Durbin-Watson Test

Statistical test to detect autocorrelation in residuals.

Signup and view all the flashcards

Linearity (regression)

The relationship between variables is linear.

Signup and view all the flashcards

Regression Assumptions

Conditions needed for valid linear regression results.

Signup and view all the flashcards

Heteroscedasticity

Variable variance of residuals in a linear regression model; spread of errors changes with independent variable(s).

Signup and view all the flashcards

Residual Plots

Plots used to detect violations of linear regression assumptions.

Signup and view all the flashcards

BLUE Estimators

Best Linear Unbiased Estimators; the most accurate estimates possible, given the data available.

Signup and view all the flashcards

Normal Distribution

Bell curved distribution shape.

Signup and view all the flashcards

Statistical Significance

How confident you are that outcome was not due to random chance.

Signup and view all the flashcards

Correlation

Statistical relationship between two variables

Signup and view all the flashcards

Outliers

Data points far from expected values

Signup and view all the flashcards

ACF plot (Autocorrelation)

Plot to detect possible correlation between errors

Signup and view all the flashcards

Independent Variable

Variable that might influence the outcome

Signup and view all the flashcards

Dependent Variable

Variable influenced by the outcome

Signup and view all the flashcards

Study Notes

Key Assumptions of Linear Regression

  • Linearity: The relationship between predictors and the response variable is linear. A change in one predictor results in a proportional change in the response. Non-linear relationships require transformations or non-linear models. Visualization using scatter plots or residual plots is important.

  • Homoscedasticity: Residuals (differences between observed and predicted values) have a constant variance across all levels of predictors. This means the spread of errors is uniform regardless of predictor value. Heteroscedasticity (varying variance) can lead to inefficient estimates and unreliable inferences. Visually assessed with residual plots.

  • Multivariate Normality: Residuals follow a normal distribution when considering multiple predictors together. This assumption is important for valid hypothesis tests, confidence intervals, and p-values. Evaluated using Q-Q plots and histograms.

  • Independence of Errors: Residuals are not correlated with one another. Each observation's error should not influence another's. Time series data often violates this. Evaluated with residual plots (looking for patterns) and autocorrelation functions (ACF).

  • Lack of Multicollinearity: Independent variables are not highly correlated. Highly correlated predictors provide redundant information, inflating coefficient standard errors and hindering accurate coefficient interpretation. Use scatter plots or heatmaps to detect this.

  • Absence of Endogeneity: Independent variables are not correlated with the error term. If violated, coefficient estimates are biased and inconsistent. Consider correlation between predictors and error term.

Detecting Violations of Assumptions

  • Residual Plots: Plot residuals against fitted values or predictors to check for patterns like non-linearity, heteroscedasticity, or correlated errors.

  • Q-Q Plots: Assess normality of residuals. A straight line suggests normality.

  • Variance Inflation Factor (VIF): Checks for multicollinearity. High VIF (e.g., >5 or 10) suggests significant multicollinearity.

  • Durbin-Watson Test: Identifies autocorrelation in residuals. A value near 2 indicates no autocorrelation.

Addressing Violations of Assumptions

  • Transformations: Changing data values(e.g. logarithms, square roots) to address non-linearity and heteroscedasticity.

  • Adding Variables: Include additional predictors if missing or correlated error terms impact analysis.

  • Regularization Techniques: Using methods like Ridge or Lasso Regression to improve model performance in case of multicollinearity

  • Robust Regression: Using methods less sensitive to assumption violations like Quantile Regression or Huber Regression.

  • Generalized Least Squares (GLS): For heteroscedastic or correlated residuals.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Description

This quiz covers the fundamental assumptions underlying linear regression analysis, focusing on linearity, homoscedasticity, multivariate normality, and the independence of errors. Understanding these principles is crucial for performing accurate linear regression and interpreting its results. Enhance your knowledge of these key concepts with this informative quiz.

More Like This

Use Quizgecko on...
Browser
Browser