Linear Regression Fundamentals

Study Notes

Definition: Simple linear regression is a statistical method that attempts to predict the value of a continuous outcome variable (y) based on a single predictor variable (x).
Assumptions:
- Linearity: The relationship between x and y should be linear.
- Independence: Each data point should be independent of the others.
- Homoscedasticity: The variance of the residuals should be constant across all levels of x.
- Normality: The residuals should be normally distributed.
- No or little multicollinearity: The predictor variable should not be perfectly correlated with the intercept.
Equation: y = β0 + β1x + ε, where β0 is the intercept, β1 is the slope, x is the predictor variable, and ε is the residual.
Coefficient of determination (R²): Measures the proportion of the variance in y that is predictable from x.
Hypothesis testing: Used to determine whether the slope (β1) is significantly different from zero.

Definition: Multiple linear regression is a statistical method that extends simple linear regression by using multiple predictor variables to predict the outcome variable (y).
Assumptions: Same as simple linear regression, with the addition of:
- No multicollinearity: The predictor variables should not be highly correlated with each other.
Equation: y = β0 + β1x1 + β2x2 + … + βnxn + ε, where β0 is the intercept, β1, β2, …, βn are the coefficients of the predictor variables, and ε is the residual.
Coefficient of determination (R²): Measures the proportion of the variance in y that is predictable from the set of predictor variables.
Hypothesis testing: Used to determine whether each predictor variable has a significant effect on the outcome variable.
Model evaluation:
- Backward elimination: Starts with all predictor variables and removes the least significant ones.
- Forward selection: Starts with no predictor variables and adds the most significant ones.
- Stepwise regression: Combines backward elimination and forward selection.