Simple Linear Regression Model

Linear Regression is a statistical method used to model relationships between a dependent variable (Y) and one or more independent variables (X).
It's a simple approach for supervised learning and predicting a quantitative response.
Linear Regression serves as a starting point for more advanced methods and aids in both prediction and inference.
The goal is to estimate the function f in the equation Y = f(X) + ɛ, where ɛ stands for random error.
Important questions to consider include: the relationship between variables, its strength, associated variables, the magnitude of the association between variables and response, the accuracy of predicting response increase or decrease, linearity, and synergy among variables.

Simple Linear Regression Model

It offers a straightforward method for predicting a quantitative response Y based on a single predictor variable X.
The approximate linear relationship is expressed as: Y ≈ ẞ0 + ẞ1X.
The population regression line provides the best linear approximation of the true relationship between X and Y, represented as: Y = β0 + β1X + ε.
The model coefficients ẞ0 and ẞ1 denote the intercept and slope, respectively, with ɛ representing the error term.
The equation ^y = ^β0 + ^β1x predicts Y based on X = x, where ^y represents the prediction.

Least Squares Estimation

The aim is to find coefficient estimates ^ẞ0 and ^ẞ1, such that the linear model closely fits the available data, represented as yi ≈ ^ẞ0 + ^ẞ1xi for i = 1,..., n.
The least squares method minimizes the sum of squared residuals (RSS) to measure closeness: RSS = ∑(ei)²= ∑(Yi - (^βΟ + ^β1xi))².
The best-fitting regression line is determined by minimizing RSS.
Applying LSE to advertising data yields ^ẞ0 = 7.03 and ^ẞ1 = 0.0475 for the regression of sales onto TV.
An additional $1,000 spent on TV advertising leads to approximately 47.5 more units of product sold.
Average predicted sales amount to 7030 units when TV advertising spending X is zero.

Assessing the Accuracy of the Coefficients

Standard Errors (SE) quantify the uncertainty of coefficient estimates, indicating the average difference between an estimate and the parameter's actual value.
Formula: Var(μ) = SE(μ)² = σ²/n
SE helps determine how close ^ẞ0 and ^ẞ1 are to the true values of ẞ0 and ẞ1.
SE(^ẞ1) decreases as xi values spread out more.
The formula residual standard error RSE = sqrt(RSS/(n – 2)) gives an estimate of σ, known as the residual standard error.
SE is important for confidence intervals, providing a range of potential coefficient values:^β0 ± 2 * SE(^β0) and ^β1 ± 2 * SE(^β1).
For the advertising data, the 95% confidence interval for ẞ0 lies between [6.130, 7.935], while for ẞ1, it's [0.042, 0.053].
With no advertising, average sales are expected to fall between 6,130 and 7,935 units.
Each $1,000 increase in TV advertising is expected to raise sales by an average of 42 to 53 units.
SE is used to perform hypothesis tests, testing the null hypothesis H0 (no relationship between X and Y) against the alternative Ha (a relationship exists).
The test determines whether ^ẞ1 is significantly far from zero to confidently assert that ẞ1 isn't zero.
Small SE(^ẞ1) values suggest that even small ^ẞ1 values indicate a relationship between X and Y, and vice versa.
For the advertising data, coefficients ^ẞ0 and ^ẞ1 are large relative to their standard errors, making the likelihood of observing such values if HO were true virtually zero.
This suggests that ẞ0 and non equal to 0 and ẞ1 are also not equal to 0.

Assessing the Accuracy of the Model

If HO is rejected, quantifying the model's fit becomes necessary.
The residual standard error (RSE) and the R² statistic assess the quality of a linear regression fit.
Due to the presence of an error term e, perfectly predicting Y from X is impossible, even with knowledge of the true regression line.
RSE estimates the standard deviation of e, indicating the average deviation of the response from the true regression line.
The table shows Residual standard error (3.26) and R2 (0.612) and F-statistic (312.1).
With the model being correct and if ẞ0 and ẞ1 were known exactly, predictions would still deviate by about 3,260 units.
The acceptability of a 3,260-unit prediction error depends on context.
With a mean sales value of 14,000 units, the percentage error for the advertising data is 23%.
RSE measures the model's lack of fit; smaller RSE values indicate better fit, while larger values suggest poor fit.
RSE lacks a standardized scale and must be interpreted relative to the scale of Y.
R² statistic indicates the proportion of variance explained by the model, ranging from 0 to 1 and being independent of Y's scale.
TSS represents the total sum of squares ∑(yi – ¯y)², which quantifies the amount of variability in the response before the regression is performed.
RSS measures the amount of variability in Y that remains unexplained after performing the regression.
TSS – RSS indicates the variability explained by the regression, while R² measures the proportion of variability in Y that can be explained using X.
For the table, the R² value of 0.61 indicates that just under two-thirds of the variability in sales results from a linear regression on TV.
Pearson's correlation, denoted as “r = Cor(X, Y ) ”, assesses the linear relationship between X and Y.
The correlation is r² = R2 when working on simple linear regression.

Multiple Linear Regression

It expands simple regression to incorporate multiple predictors: Y = ẞ0 + β1X1 + β2X2 + ... + βpXp + ε.
Used to model complex relationships between Y and multiple explanatory variables.