Lecture 6 - Multiple Linear Regression PDF
Document Details
Uploaded by ReasonableDerivative
Southampton
2024
Nicolas Apfel
Tags
Summary
This document is a lecture on multiple linear regression, covering concepts like the economic model, econometric models, regression parameters, and causal interpretations. It presents the model with a focus on assumptions, calculation of error variance, and sampling properties of the OLS estimators.
Full Transcript
# Lecture 6 - Multiple Linear Regression - **Monday, 18 November 2024**, 1:08 PM ## Economic Model - Consider the following economic model: - **S = β₁ + β₂P + β₃A** - Where **S** = monthly sales, **P** = price index, **A** = advertising expenditure - The corresponding econometric model i...
# Lecture 6 - Multiple Linear Regression - **Monday, 18 November 2024**, 1:08 PM ## Economic Model - Consider the following economic model: - **S = β₁ + β₂P + β₃A** - Where **S** = monthly sales, **P** = price index, **A** = advertising expenditure - The corresponding econometric model is: - **Sᵢ = β₁ + β₂Pᵢ + β₃Aᵢ + eᵢ** - Assume we have a random sample of (Sᵢ, Pᵢ, Aᵢ) ## The Multiple Linear Regression Model - Assume also **exogeneity:** - **E[eᵢ|Pᵢ, Aᵢ] = 0** - This means two things: - **E[eᵢ] = 0** - **cov(eᵢ, Pᵢ) = 0** and **cov(eᵢ, Aᵢ) = 0** - Hence, we can write: - **E[Sᵢ|Pᵢ, Aᵢ] = β₁ + β₂Pᵢ + β₃Aᵢ** ## Interpreting Regression Parameters - Using the conditional expectation, we can interpret the regression parameters: - **β₂ = ∂E[S|P, A]/∂P**, (A held constant) = ∂E[S|P, A]/∂P - **β₃ = ∂E[S|P, A]/∂A**, (P held constant) = ∂E[S|P, A]/∂A - Recall the meaning of partial derivative: **ceteris paribus** - What's the sign of β₂? ## The Causal Interpretation - The key assumption for a **causal interpretation** of the parameters β₂ and β₃ is **exogeneity:** **E[eᵢ|Pᵢ, Aᵢ] = 0** - Thanks to this assumption, we can interpret β₂ as the causal effect of price on sales, holding everything else constant, including the unobservable factors in eᵢ. - So, the **multiple linear regression model** is the framework that allows us to quantify causal economic relations by **including as regressors (controlling for)** all relevant factors. ## Geometric Representation of Multiple Regression - The general model with K regressors: - **yᵢ = β₁ + β₂xᵢ₂ + β₃xᵢ₃ + ... + βкxᵢк + eᵢ** - The slope parameter represents: - **βк = ΔE[y|x₂, ..., xк]/Δxк** (other x's held constant) = ∂E[y|x₂, ..., xк]/∂xк ## Classical Assumptions **Assumption MLR1:** - **Linearity of population model:** - **yᵢ = β₁ + β₂xᵢ₂ + β₃xᵢ₃ + ... + βкxᵢк + eᵢ** **Assumption MLR2:** - **Strict exogeneity:** - **E[eᵢ|X] = 0** - X is a matrix containing all units and regressors. - This implies: - **E[eᵢ] = 0** - **cov(eᵢ, Xjk) = 0** ∀i,j,k - **E[yᵢ|X] = β₁ + β₂xᵢ₂ + β₃xᵢ₃ + ... + βкxᵢк** **Assumption MLR3:** - **Conditional homoskedasticity:** - **var[eᵢ|X] = σ²** - This implies: - **var[eᵢ] = σ²** - **var[yᵢ|X] = var[eᵢ|X] = σ²** **Assumption MLR4:** - **Conditionally uncorrelated errors:** - **cov[eᵢ, eⱼ|X] = 0** - This implies: - **cov[eᵢ, eⱼ] = 0** - **cov[yᵢ, yⱼ|X] = cov[eᵢ, eⱼ|X] = 0** **Assumption MLR5:** - The values of each xᵢк are not exact linear functions of the other explanatory variables. - **No perfect multicollinearity.** - There is no set of coefficients c₁, ..., cк with at least one coefficient ≠ 0 such that: - **c₁xᵢ₁ + c₂xᵢ₂ + ... + cкxᵢк = 0** **Assumption MLR6:** - **Error normality:** - **eᵢ|X ~ N(0, σ²)** - This implies: - **yᵢ|X ~ N(E[yᵢ|X], σ²)** - Where **E[yᵢ|X] = β₁ + β₂xᵢ₂ + β₃xᵢ₃ + ... + βкxᵢк** ## Least Squares Parameter Estimation - Assume we have 3 regressors (including the intercept): - **yᵢ = β₁ + β₂xᵢ₂ + β₃xᵢ₃ + eᵢ** - The sum of squared residuals is: - **S(b₁, b₂, b₃) = ∑(yᵢ - E[yᵢ])² = ∑(yᵢ - b₁ - b₂xᵢ₂ - b₃xᵢ₃)²** - We want to choose b₁, b₂, b₃ to **minimize S (Least Squares principle)**. ## Example - **Sᵢ = β₁ + β₂Pᵢ + β₃Aᵢ + eᵢ** - Least Squares estimates for sales equation for Big Andy's Burger Barn: | Variable | Coefficient | Std. Error | t-Statistic | Prob. | |:---|:---|:---|:---|:---| | C | 118.9136 | 6.3516 | 18.7217 | 0.0000 | | PRICE | -7.9079 | 1.0960 | -7.2152 | 0.0000 | | ADVERT | 1.8626 | 0.6832 | 2.7263 | 0.0080 | | R² | 0.4483 | SSE | 1718.943 | σ | 4.8861 | s | 6.48854 | - **Predicted value of Sales:** - **Ŝᵢ = b₁ + b₂xᵢ₂ + b₃xᵢ₃ = 118.91 – 7.908xᵢ₂ + 1.863xᵢ₃** - **Question:** What is the interpretation of each estimated coefficient in this model? - Suppose we are interested in predicting sales revenues for a price of $5.50 and an advertising expenditure of $1,200: - **Ŝ = 118.91 – 7.908 * 5.50 + 1.863 * 1,200 = 77.656** - Estimated regression models describe the relationship between the economic variables for values similar to those found in the sample data. Extrapolating the results to extreme values is generally not a good idea. Predicting the value of the dependent variable for values of the explanatory variables far from the sample values invites disaster. ## Estimation of the Error Variance - There is one last unknown parameter we need to estimate in a regression model: - **σ² = var[eᵢ] = E[e₁²]** - To estimate this parameter we can use the OLS residuals: - **êᵢ = yᵢ - ŷᵢ = yᵢ - (b₁ + b₂xᵢ₂ + b₃xᵢ₃)** - From this we can obtain an unbiased estimator of σ²: - **σ² = ∑êᵢ²/N-K** - Using our example (N = 75, K = 3), the residual sum of squares is: - **SSE = ∑êᵢ² = 1718.943** - So, the estimated error variance is: - **σ² = ∑êᵢ²/N-K = 1718.943/75-3 = 23.874** - Estimated standard deviation of the error: - **σ = √23.874 = 4.886** ## Sampling Properties of the OLS Estimators - Consider the sampling distribution of the OLS estimators bᵢ. - An important result characterizes this distribution: **The Gauss-Markov Theorem** For the multiple regression model, if assumptions MLR1-MLR5 hold, then the least squares estimators are the Best Linear Unbiased Estimators (BLUE) of the parameters. - This is a finite (or small) sample property of the OLS estimator. ## Variance of OLS Estimators - The variance of the OLS estimator of β₂ is: - **var(b₂|X) = σ²/(1-r₂²)∑(xᵢ₂ - x₂)²,** where - **r₂₃ = ∑(xᵢ₂ - x₂)(xᵢ₃ - 3)/√∑(xᵢ₂ - x₂)∑(xᵢ₃ - 3)²** - The uncertainty associated with our estimate of β₂ is influenced by: - The error variance σ² (model uncertainty) - The sample size N - Amount of variation in the regressor x₂ - Correlation between regressors ## Covariance matrix of OLS estimators - Consider our multiple linear regression model: - **yᵢ = β₁ + β₂xᵢ₂ + β₃xᵢ₃ + ... + βкxᵢк + eᵢ** - If we assume errors are normal, we have: - **eᵢ|X ~ N(0, σ²) ↔ yᵢ|X ~ N(β₁ + β₂xᵢ₂ + ... + βкxᵢк, σ²)** - Since the OLS estimator is a linear function of yᵢ, we have: - **bк|X ~ N(βк, var(bк))** ## Sampling distribution of OLS estimators - Hence, we can easily get the well-known Normal Standard distribution: - **Z = (bк - βк)/√var(bк) ~ N(0,1)** - However, since we need to estimate σ², we get something slightly different: - **t = (bк - βк)/√var(bк) = (bк - βк)/se(bк) ~ t<sub>N-K</sub>** ## Interval estimation - The knowledge of the distribution of bк allows us to do inference, for instance we can construct confidence intervals. - **Example:** - **P(-t<sub>c</sub> < (b₂-β₂)/se(b₂) < t<sub>c</sub>) = 0.95** - **P(-1.993 < (b₂-β₂)/se(b₂) < 1.993) = 0.95** - **[b₂ - 1.993 se(b₂), b₂ + 1.993. se(b₂)]** - The 95% interval estimate of β₂ based on our sample is: - **(-10.092, -5.724)** - The 95% interval estimate of β₃ based on our sample is: - **(1.863 - 1.993 * 0.683, 1.863 + 1.993 * 0.683)** - The general expression for a 100(1-α)% confidence interval is: - **(bк - t<sub>(1-α, N-K)</sub>se(bк), bк + t<sub>(1-α, N-K)</sub>se(bк))** ## Hypothesis testing **Step-by-step procedure for hypothesis testing** 1. Define null and alternative hypotheses 2. Specify the test statistic and its distribution under the null 3. Decide α and determine the rejection region 4. Calculate the sample value of the test statistic 5. State your conclusion ## Hypothesis testing for a single parameter **Testing the statistical significance of a parameter** 1. **Define:** - **H₀: βк = 0** - **H₁: βк ≠ 0** 2. **Specify:** - **t = (bк - βк)/se(bк) ~ t<sub>N-K</sub>** 3. **Calculate:** - **t<sub>c</sub> = t<sub>(1-α, N-K)</sub> or p - value(t)** 4. **Conclude:** - **Reject H₀ if |t| > t<sub>c</sub> or p - value(t) < α**