Week 7 - Multiple Regression Model PDF

Document Details

UsableChiasmus

Uploaded by UsableChiasmus

University College Groningen

Tags

econometrics multiple regression regression analysis statistics

Summary

These lecture notes cover multiple regression models. Topics include assumptions, estimation, and hypothesis testing, as well as calculating different variances and standard errors.

Full Transcript

Week 7: Multiple regression model BT22203: Econometrics A regression model with more than Most one explanatory regression modelsvariable are multiple since very few...

Week 7: Multiple regression model BT22203: Econometrics A regression model with more than Most one explanatory regression modelsvariable are multiple since very few economic phenomena can be explained by a single explanatory variablein this Point of discussion chapter: Introduction How to estimate multiple regression model Hypothesis testing The three-variable linear regression model Three-variable PRF in nonstochastic form: 𝐸 𝑌𝑡 = 𝛽1 + 𝛽2 𝑋2𝑡 + 𝛽3 𝑋3𝑡 (4.1) Three-variable PRF in stochastic form: 𝑌𝑡 = 𝛽1 + 𝛽2 𝑋2𝑡 + 𝛽3 𝑋3𝑡 + 𝑢𝑡 (4.2) = 𝐸 𝑌𝑡 + 𝑢𝑡 (4.3) Where 𝑌 = Dependent variable 𝑋2 and 𝑋3 = Explanatory variables 𝑢 = stochastic disturbance term 𝑡 = 𝑡th observation 𝛽2 and 𝛽3 = Partial regression coefficients Equation 4.1 gives the conditional mean value of Y, conditional upon the given or fixed values of 𝑋2 and 𝑋3. Equation 4.2 is divided into systematic or deterministic (𝛽1 + 𝛽2 𝑋2𝑡 + 𝛽3 𝑋3𝑡 ) and nonsystematic or random component 𝑢𝑡. The meaning of partial regression coefficient Partial regression or partial slope coefficient. 𝛽2 measures change in the mean value of 𝑌or 𝐸 𝑌 per unit change in 𝑋2 , holding the value of 𝑋3 constant and likewise for 𝛽3. Partial regression coefficient reflects the (partial) effect of one explanatory variable on the mean value of the dependent variable when the values of other explanatory variables included in the model are held constant. Assumptions of the multiple regression model Assumption 1: the regression model is linear in parameter Assumption 2: 𝑋2 and 𝑋3 are uncorrelated with the disturbance term 𝑢 Assumption 3: the error term 𝑢 has a zero mean value 𝐸 𝑢𝑖 = 0 (4.7) Assumption 4: the variance of 𝑢 is constant or homoscedastic 𝐸 𝑢𝑖 = 𝜎 2 (4.8) Assumption 5: no autocorrelation between the error terms cov 𝑢𝑖 , 𝑢𝑗 𝑖≠𝑗 (4.9) Assumption 6: no exact collinearity between 𝑋2 and 𝑋3 Assumption 7: the error term 𝑢𝑖 follows the normal distribution with zero mean and constant variance 𝜎 2 𝑢𝑖 ~𝑁 0, 𝜎 2 (4.10) Estimation of the parameters of multiple regression through Ordinary Least Squares SRF corresponding to PRF Eq.(4.2): 𝑌𝑡 = 𝑏1 + 𝑏2 𝑋2𝑡 + 𝑏3 𝑋3𝑡 + 𝑒𝑡 (4.13) 𝑏1 , 𝑏2 , and 𝑏3 are the estimators for 𝛽1 , 𝛽2 , and 𝛽3 respectively The sample counterpart of Eq.(4.1): 𝑌෠𝑡 = 𝑏1 + 𝑏2 𝑋2𝑡 + 𝑏3 𝑋3𝑡 (4.14) The OLS principles choses the unknown parameter such that the RSS or σ 𝑒𝑡2 is as small as possible: 𝑒𝑡 = 𝑌𝑡 − 𝑏1 − 𝑏2 𝑋2𝑡 − 𝑏3 𝑋3𝑡 (4.15) RSS, which is the sum of squared difference between actual 𝑌𝑡 and estimated 𝑌෠𝑡 is obtained by squaring the equation on both sides: 2 σ 𝑒𝑡2 = σ 𝑌𝑡 − 𝑏1 − 𝑏2 𝑋2𝑡 − 𝑏3 𝑋3𝑡 (4.16) The three OLS estimators are as follows: 𝑏1 = 𝑌ത − 𝑏2 𝑋ത2 − 𝑏3 𝑋ത3 4.20 2 − σ𝑦 𝑥 σ 𝑦𝑡 𝑥2𝑡 σ 𝑥3𝑡 𝑡 3𝑡 σ 𝑥2𝑡 𝑥3𝑡 𝑏2 = 2 σ 𝑥2𝑡 2 − σ𝑥 𝑥 σ 𝑥3𝑡 2 4.21 2𝑡 3𝑡 2 σ 𝑦𝑡 𝑥3𝑡 σ 𝑥2𝑡 − σ 𝑦𝑡 𝑥2𝑡 σ 𝑥2𝑡 𝑥3𝑡 𝑏3 = 2 σ 𝑥2𝑡 2 − σ𝑥 𝑥 σ 𝑥3𝑡 2 4.22 2𝑡 3𝑡 Lowercase letters denote deviations from sample mean values (e.g., 𝑦𝑡 = ത 𝑌𝑡 − 𝑌). Variance and standard errors of OLS estimators 2 2 1 𝑋ത22 σ 𝑥3𝑡 +𝑋ത32 σ 𝑥2𝑡 −2𝑋ത2 𝑋ത3 σ 𝑥2𝑡 𝑥3𝑡 𝑣𝑎𝑟 𝑏1 = + 2 2 σ 𝑥2𝑡 σ 𝑥3𝑡 − σ 𝑥2𝑡 𝑥3𝑡 2 ∙ 𝜎2 (4.23) 𝑛 𝑠𝑒 𝑏1 = 𝑣𝑎𝑟 𝑏1 (4.24) 2 σ 𝑥3𝑡 𝑣𝑎𝑟 𝑏2 = 2 )(σ 𝑥 2 )− σ 𝑥2𝑡 𝑥3𝑡 2 ∙ 𝜎2 (4.25) (σ 𝑥2𝑡 3𝑡 𝑠𝑒 𝑏2 = 𝑣𝑎𝑟 𝑏2 (4.26) 2 σ 𝑥2𝑡 𝑣𝑎𝑟 3 = 2 )(σ 𝑥 2 )− σ 𝑥2𝑡 𝑥3𝑡 2 ∙ 𝜎2 (4.27) (σ 𝑥2𝑡 3𝑡 𝑠𝑒 𝑏3 = 𝑣𝑎𝑟 3 (4.28) The OLS estimator for the unknown variance is: σ 𝑒𝑡2 𝜎ො 2 = (4.29) 𝑛−3 Thus, the standard error of the estimate is: 𝜎ො = 𝜎 2 (4.30) A shortcut on computing the RSS: 2 σ 𝑒𝑡2 = σ 𝑌𝑡 − 𝑌෠𝑡 = σ 𝑦𝑡2 − 𝑏2 σ 𝑦𝑡 𝑥2𝑡 − 𝑏3 σ 𝑦𝑡 𝑥3𝑡 (4.31) Example Goodness of fit of estimated multiple regression: Multiple coefficient of determination, 𝑹𝟐 Multiple coefficient of determination or 𝑹𝟐 shows the proportion of the total variation in 𝑌 = σ 𝑦𝑡2 explained by 𝑋2 and 𝑋3 jointly. We identify that: 𝑇𝑆𝑆 = 𝐸𝑆𝑆 + 𝑅𝑆𝑆 (4.32) 𝑅 2 is defined as: 𝐸𝑆𝑆 𝑅2 = (4.33) 𝑇𝑆𝑆 𝐸𝑆𝑆 = 𝑏2 σ 𝑦𝑡 𝑥2𝑡 + 𝑏2 σ 𝑦𝑡 𝑥3𝑡 (4.34) 𝑅𝑆𝑆 = σ 𝑦𝑡2 − 𝑏2 σ 𝑦𝑡 𝑥2𝑡 − 𝑏3 σ 𝑦𝑡 𝑥3𝑡 (4.35) Therefore: 𝑏2 σ 𝑦𝑡 𝑥2𝑡 +𝑏3 σ 𝑦𝑡 𝑥3𝑡 𝑅2 = σ 𝑦𝑡2 (4.36) For that, coefficient of multiple correlation, 𝑹 is: 𝑅 = ± 𝑅2 𝑅 is interpreted as the degree of linear association between 𝑌 and all the 𝑋 variables jointly. Hypothesis testing in a multiple regression: General comments We will continue to assume that 𝑢 is normally distributed with zero mean and constant variance 𝜎 2 (Assumption 7) But just like in simple regression, as we replace the true unobservable 𝜎 2 by its unbiased estimator 𝜎ො 2 given in Eq.(4.29), the OLS estimators follow the t-distribution with (𝑛 − 3) d.f and not the normal distribution, that is: 𝑏1 −𝛽1 𝑡= ~ 𝑡𝑛−3 (4.38) 𝑠𝑒 𝑏1 𝑏2 −𝛽2 𝑡= ~ 𝑡𝑛−3 (4.39) 𝑠𝑒 𝑏2 𝑏3 −𝛽2 𝑡= ~ 𝑡𝑛−3 (4.40) 𝑠𝑒 3 The actual mechanics in many ways resemble the two-variable case. Testing hypothesis about individual partial regression coefficients: The test of significance approach Step 1: define the hypothesis statement Step 2: choose level of significance 𝛼 Step 3: develop the t-statistics Step 4: determine the t-critical value or the p-value Step 5: reject null hypothesis 𝐻0 if the absolute t-statistics 𝑡 is larger than t-critical or If p-value is smaller than the significance level 𝛼 Testing the joint hypothesis that 𝜷𝟐 = 𝜷𝟑 = 𝟎 or 𝑹𝟐 = 𝟎 Testing the joint significance of the explanatory variables Step 1: Consider the following null hypothesis: 𝐻0 : 𝛽2 = 𝛽3 = 0 (4.46) This null hypothesis is a joint hypothesis that 𝛽2 and 𝛽3 are jointly or simultaneously equal to zero. This is the same as saying: 𝐻0 : 𝑅2 = 0 (4.47) That is the two explanatory variables explain zero percent of the variation in the dependent variable. Step 2: choose level of significance 𝛼 Step 3: Using the analysis of variance (ANOVA) technique, the F-statistics is: 𝑅2 Τ 𝑘−1 𝐹= (4.50) 1−𝑅2 Τ 𝑛−𝑘 Where 𝑘 is the number of explanatory variables including the intercept and 𝑛 is the number of observations. Step 4: F-critical value is obtained from the F distribution table with two degrees of freedoms d.f: d.f (1) = 𝑘 − 1 and d.f (2) = 𝑛 − 𝑘 Step 5: Reject 𝐻0 if F-statistics is larger than F-critical at a given significance level 𝛼 Comparing two 𝑹𝟐 values: The adjusted 𝑹𝟐 The larger the number of explanatory variables in a model, the higher the 𝑅2 will be. Therefore, comparing the 𝑅2 values of two models with the same dependable variable but with differing numbers of explanatory variables is essentially comparing apples and oranges. The adjusted 𝑅ത 2 is a measure of goodness of fit that is adjusted for the number of explanatory variables in the model. 𝑛−1 𝑅ത 2 = 1 − (1 − 𝑅2 ) (4.54) 𝑛−𝑘 Features of the 𝑅ത 2 are: If 𝑘 > 1, 𝑅ത 2 ≤ 𝑅2 , that is as the number of explanatory variables increases, the 𝑅ത 2 becomes increasingly smaller than 𝑅2. The 𝑅ത 2 can on occasionally turn out to be negative. BT22203 Econometrics Saizal Pinjaman

Use Quizgecko on...
Browser
Browser