Simple Regression Model PDF
Document Details
Uploaded by FestiveNonagon
Tags
Summary
This document explains the simple regression model, concepts like residuals, and some examples using real-world data concerning CEOs' salaries. The document provides a good introduction to fundamental statistical modeling techniques and provides detailed formulas and information. The examples presented will help to understand the practical application of the model.
Full Transcript
Simple Regression Model Terminology Regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 y is dependent variable, x is independent variable (one independent variable for a simple regression), u is error, β0 and β1 are parameters. Estimated equation: 𝑦𝑦 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥 Population Sample 𝑦...
Simple Regression Model Terminology Regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 y is dependent variable, x is independent variable (one independent variable for a simple regression), u is error, β0 and β1 are parameters. Estimated equation: 𝑦𝑦 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥 Population Sample 𝑦𝑦 is predicted value, 𝛽𝛽̂0 and 𝛽𝛽̂1 are coefficients Parameter 𝛽𝛽 Coefficient 𝛽𝛽̂ Error 𝑢𝑢 Residual 𝑢𝑢 Residual: 𝑢𝑢 = 𝑦𝑦 − 𝑦𝑦 𝑢𝑢 = actual value minus predicted value for dependent variable 3 Simple regression model example Dependent Indep. Predicted value Residual Simple regression: actual and predicted values variable y variable x 𝑦𝑦 = 20 + 0.5𝑥𝑥 𝑢𝑢 = 𝑦𝑦 − 𝑦𝑦 22.5 22 Hourly wage Years of 21.5 $ experience 21 20 1 =20+0.5*1=20.5 =20-20.5=-0.5 20.5 21 2 =20+0.5*2=21 =21-21=0 20 19.5 21 1 =20+0.5*1=20.5 =21-20.5=0.5 0 0.5 1 1.5 2 2.5 3 3.5 Hourly wage Predicted wage Linear (Predicted wage) 22 3 =20+0.5*3=21.5 =22-21.5=0.5 Simple regression: hourly wage depends on years of experience. Figure shows regression line, slope, predicted values, actual values, and residuals. 4 Simple regression: actual values, predicted values, and residuals Regression line fits as good as possible through the data points 5 Interpretation of coefficients ∆𝑦𝑦 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦 𝛽𝛽̂1 = = ∆𝑥𝑥 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥 The coefficient 𝛽𝛽̂1 measures by how much the dependent variable changes when the independent variable changes by one unit. 𝛽𝛽̂1 is also called slope in the simple linear regression. A derivative of a function is another function showing the slope. ∆𝑢𝑢 The formula above is correct if =0, which means all other factors ∆𝑥𝑥 are fixed. 6 Population regression function Population regression function: 𝐸𝐸 𝑦𝑦 𝑥𝑥 = 𝐸𝐸 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 𝑥𝑥 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝐸𝐸 𝑢𝑢 𝑥𝑥 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 if 𝐸𝐸 𝑢𝑢 𝑥𝑥 =0 (this assumption is called zero conditional mean) For the population, the average value of the dependent variable can be expressed as a linear function of the independent variable. 7 Population regression function Population regression function shows the relationship between y and x for the population 8 Population regression function For individuals with a particular x, the average value of y is 𝐸𝐸 𝑦𝑦 𝑥𝑥 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 Note that x1, x2, x3 here refers to xi and not different variables 9 Derivation of the OLS estimates For a regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 We need to estimate the regression equation: 𝑦𝑦 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥 and find the coefficients 𝛽𝛽̂0 and 𝛽𝛽̂1 by looking at the residuals 𝑢𝑢 = 𝑦𝑦 − 𝑦𝑦 = 𝑦𝑦 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥 Obtain a random sample of data with n observations (𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 ), where 𝑖𝑖 = 1 … 𝑛𝑛 is the observation The goal is to obtain as good fit as possible of the estimated regression equation 10 Derivation of the OLS estimates Minimize the sum of squared residuals 𝑛𝑛 𝑛𝑛 min 𝑢𝑢 2 = (𝑦𝑦 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥 )2 𝑖𝑖=1 𝑖𝑖=1 We obtain OLS coefficients: ∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)(𝑦𝑦 ̅ 𝑖𝑖 − 𝑦𝑦) 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) ̂ 𝛽𝛽1 = = ∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 𝑣𝑣𝑣𝑣𝑣𝑣(𝑥𝑥) 𝛽𝛽̂0 = 𝑦𝑦 − 𝛽𝛽̂1 𝑥𝑥̅ OLS is Ordinary Least Squares, based on minimizing the squared residuals. 11 OLS properties 𝑦𝑦 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥̅ The sample average of the dependent and independent variable are on the regression line 𝑛𝑛 𝑢𝑢 = 0 𝑖𝑖=1 The residuals sum up to zero (note that we minimized the sum of squared residuals) 𝑛𝑛 𝑥𝑥 𝑢𝑢 = 0 𝑖𝑖=1 The covariance between the independent variable and residual is zero. 12 Simple regression example: CEO’s salary Simple regression model explaining how return on equity (roe) affects CEO’s salary. Regression model 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝛽𝛽0 + 𝛽𝛽1 𝑟𝑟𝑟𝑟𝑟𝑟 + 𝑢𝑢 Estimated equation for predicted value of wage = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 Residuals 𝑢𝑢 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 − 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 We estimate the regression model to find the coefficients. 𝛽𝛽̂1 measures the change in the CEO’s salary associated with one unit increase in roe, holding other factors fixed. 13 Estimated equation and interpretation Estimated equation = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟 = 963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 Salary is measured in thousand dollars, ROE (return on equity) is measured in %. 𝛽𝛽̂1 measures the change in the CEO’s salary associated with one unit increase in roe, holding other factors fixed. Interpretation of 𝛽𝛽̂1 : the CEO’s salary increases by $18,501 for each 1% increase in ROE. Interpretation of 𝛽𝛽̂0 : if the ROE is zero, the CEO’s salary is $963,191. 14 Stata output for simple regression. regress salary roe Source SS df MS Number of obs = 209 F(1, 207) = 2.77 Model 5166419.04 1 5166419.04 Prob > F = 0.0978 Residual 386566563 207 1867471.32 R-squared = 0.0132 Adj R-squared = 0.0084 Total 391732982 208 1883331.64 Root MSE = 1366.6 salary Coef. Std. Err. t P>|t| [95% Conf. Interval] roe 18.50119 11.12325 1.66 0.098 -3.428196 40.43057 _cons 963.1913 213.2403 4.52 0.000 542.7902 1383.592 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟 = 963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 15 Simple regression results in a table (1) VARIABLES salary roe 18.50* (11.12) Constant 963.2*** (213.2) Observations 209 R-squared 0.013 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟 = 963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 16 Regression line for sample vs population regression function for population 17 Estimated regression 15000 10000 5000 0 0 20 40 60 return on equity, 88-90 avg 1990 salary, thousands $ Fitted values Actual and predicted values 18 Actual values, predicted values, and residuals 15000 10000 5000 0 0 20 40 60 return on equity, 88-90 avg True value Predicted value Residual 19 Actual, predicted values, and residuals roe salary 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑢𝑢 predicted value Residual 963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 − 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 14.1 1095 1224 -129 10.9 1001 1165 -164 23.5 1122 1398 -276 5.9 578 1072 -494 13.8 1368 1219 149 20 1145 1333 -188 16.4 1078 1267 -189 16.3 1094 1265 -171 10.5 1237 1157 80 26.3 833 1450 -617 The mean salary is 1,281 ($1,281,000). The mean predicted salary is also 1,281. The mean for the residuals is zero. 20 Simple regression example: wage Simple regression model explaining how education affects wages for workers. Regression model 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢 Estimated equation for predicted value of wage = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 Residuals 𝑢𝑢 = 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 − 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 We estimate the regression model to find the coefficients. 𝛽𝛽̂1 measures the change in wage associated with one more year of education, holding other factors fixed. 21 Estimated equation and interpretation Estimated equation = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = −0.90 + 0.54 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 Wage is measured in $/hour. Education is measured in years. 𝛽𝛽̂1 measures the change in person’s wage associated with one additional year increase in education, holding other factors fixed. Interpretation of 𝛽𝛽̂1 : the hourly wage increases by $0.54 for additional year of education. Interpretation of 𝛽𝛽̂0 : if education is zero, person’s wage is -$0.90 (but no one in the sample has zero education). 22 Stata output for simple regression. reg wage educ Source SS df MS Number of obs = 526 F(1, 524) = 103.36 Model 1179.73205 1 1179.73205 Prob > F = 0.0000 Residual 5980.68226 524 11.4135158 R-squared = 0.1648 Adj R-squared = 0.1632 Total 7160.41431 525 13.6388844 Root MSE = 3.3784 wage Coef. Std. Err. t P>|t| [95% Conf. Interval] educ.5413593.053248 10.17 0.000.4367534.6459651 _cons -.9048517.6849678 -1.32 0.187 -2.250472.4407687 23 Variations 𝑆𝑆𝑆𝑆𝑆𝑆 = ∑𝑛𝑛𝑖𝑖=1(𝑦𝑦 − 𝑦𝑦) 2 𝑆𝑆𝑆𝑆𝑆𝑆 = ∑𝑛𝑛𝑖𝑖=1(𝑦𝑦 − 𝑦𝑦) 2 𝑆𝑆𝑆𝑆𝑆𝑆 = ∑𝑛𝑛𝑖𝑖=1(𝑦𝑦 − 𝑦𝑦) 2 = ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢 2 SST = SSE + SSR SST is total sum of squares and measures the total variation in the dependent variable SSE is explained sum of squares and measures the variation explained by the regression SSR is residual sum of squares and measures the variation not explained by the regression Note: some call SSE error sum of squared and SSR regression sum of squares, where R & E are confusingly reversed. 24 Variations 25 Goodness of fit measure R-squared R2 = SSE/SST = 1 – SSR/SST R-squares is explained sum of squares divided by total sum of squares. R-squared is a goodness of fit measure. It measures the proportion of total variation that is explained by the regression. An R-squared of 0.7 is interpreted as 70% of the variation is explained by the regression and the rest is due to error. R-squared that is greater than 0.25 is considered good fit. 26 R-squared calculated. reg wage educ Source SS df MS Number of obs = 526 F(1, 524) = 103.36 Model 1179.73205 1 1179.73205 Prob > F = 0.0000 Residual 5980.68226 524 11.4135158 R-squared = 0.1648 Adj R-squared = 0.1632 Total 7160.41431 525 13.6388844 Root MSE = 3.3784 wage Coef. Std. Err. t P>|t| [95% Conf. Interval] educ.5413593.053248 10.17 0.000.4367534.6459651 _cons -.9048517.6849678 -1.32 0.187 -2.250472.4407687 R-squared = SS Model /SS Total = 1179.73 / 7160.41 = 0.1648 16% of the variation in wage is explained by the regression and the rest is due to error. This is not a very good fit. 27 Log transformation (logged variables) Sometimes variables (y or x) are expressed as logs, log(y) or log(x) With logs, interpretation is in percentage/elasticity Variables such as age and education that are measured in units such as years should not be logged Variables measured in percentage points (e.g. interest rates) should not be logged Logs cannot be used if variables have zero or negative values Taking logs often reduces problems with large values or outliers Taking logs helps with homoskedasticity and normality 28 Log-log form Linear regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 log-log form: 𝑙𝑙𝑙𝑙𝑙𝑙(𝑦𝑦) = 𝛽𝛽0 + 𝛽𝛽1 log(𝑥𝑥) + 𝑢𝑢 Instead of the dependent variable, use log of the dependent variable. Instead of the independent variable, use log of the independent variable. ∆log(𝑦𝑦) ∆𝑦𝑦 𝑥𝑥 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦 𝛽𝛽̂1 = = = ∆log(𝑥𝑥) 𝑦𝑦 ∆𝑥𝑥 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥 The dependent variable changes by 𝛽𝛽̂1 percent when the independent variable changes by one percent. 29 Log-linear form (also called semi-log) Linear regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 Log-linear form: 𝑙𝑙𝑙𝑙𝑙𝑙(𝑦𝑦) = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 Instead of the dependent variable, use log of the dependent variable. ∆log(𝑦𝑦) ∆𝑦𝑦 1 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦 𝛽𝛽̂1 = = = ∆𝑥𝑥 𝑦𝑦 ∆𝑥𝑥 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥 The dependent variable changes by 𝛽𝛽̂1 *100 percent when the independent variable changes by one unit. 30 Linear-log form Linear regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 Linear-log form: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 log(𝑥𝑥) + 𝑢𝑢 Instead of the independent variable, use log of the independent variable. ∆𝑦𝑦 𝑥𝑥 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦 𝛽𝛽̂1 = = ∆𝑦𝑦 = ∆log(𝑥𝑥) ∆𝑥𝑥 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥 The dependent variable changes by 𝛽𝛽̂1 /100 units when the independent variable changes by one percent. 31 Example of data with logs wage lwage educ 3.10 1.13 11 3.24 1.18 12 3.00 1.10 11 6.00 1.79 8 5.30 1.67 12 8.75 2.17 16 11.25 2.42 18 5.00 1.61 12 3.60 1.28 12 18.18 2.90 17 32 Linear vs log-linear form 25 3 20 2 15 1 10 0 5 0 -1 0 5 10 15 20 0 5 10 15 20 educ educ wage Fitted values lwage Fitted values Linear form: wage on education Log-linear form: log wage on education 33 Linear vs log-linear form (1) (2) VARIABLES wage lwage educ 0.541*** 0.0827*** (0.0532) (0.00757) Constant -0.905 0.584*** (0.685) (0.0973) Observations 526 526 R-squared 0.165 0.186 Linear form: wage increases by $0.54 for each additional year of education. Log-linear form: wage increases by 8.2% for each additional year of education. 34 Example of data with logs Salary Sales (thousand (Million dollars) lsalary dollars) lsales 1095 7.0 27595 10.2 1001 6.9 9958 9.2 1122 7.0 6126 8.7 578 6.4 16246 9.7 1368 7.2 21783 10.0 1145 7.0 6021 8.7 1078 7.0 2267 7.7 1094 7.0 2967 8.0 1237 7.1 4570 8.4 833 6.7 2830 7.9 Note that one unit is thousand dollars for salary and million dollars for sales. 35 Linear vs log-log form 15000 10 9 10000 8 7 5000 6 0 5 0 20000 40000 60000 80000 100000 4 6 8 10 12 1990 firm sales, millions $ natural log of sales 1990 salary, thousands $ Fitted values natural log of salary Fitted values Linear form: salary on sales Log-log form: log salary on log sales 36 Log-linear vs linear-log form 15000 10 9 10000 8 7 5000 6 0 5 0 20000 40000 60000 80000 100000 4 6 8 10 12 1990 firm sales, millions $ natural log of sales natural log of salary Fitted values 1990 salary, thousands $ Fitted values Log-linear form: log salary on sales Linear-log form: salary on log sales 37 Interpretation of coefficients Linear Log-log Log-linear Linear-log VARIABLES salary lsalary lsalary salary sales 0.0155* 1.50e-05*** (0.00891) (3.55e-06) lsales 0.257*** 262.9*** (0.0345) (92.36) Constant 1,174*** 4.822*** 6.847*** -898.9 (112.8) (0.288) (0.0450) (771.5) Linear form: salary increases by 0.155 thousand dollars ($155 dollars) for each additional one million dollars in sales. Log-log form: salary increases by 0.25% for every 1% increase in sales. Log-linear form: salary increases by 0.0015% (=0.000015*100) for each additional one million dollar increase in sales. Linear-log form: salary increases by 2.629 (=262.9/100) thousand dollars for each additional 1% increase in sales. 38 Gauss Markov assumptions Gauss Markov assumptions are standard assumptions for the linear regression model 1. Linearity in parameters 2. Random sampling 3. No perfect collinearity (or sample variance in the independent variable) 4. Exogeneity or zero conditional mean – regressors are not correlated with the error term 5. Homoscedasticity – variance of error term is constant 40 Assumption 1: linearity in parameters 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 The relationship between y and x is linear in the population. Note that the regression model can have logged variables (e.g. log sales), squared variables (e.g. education2) or interactions of variables (e.g. education*experience) but the 𝛽𝛽 parameters are linear. 41 Assumption 2: random sampling 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 , where 𝑖𝑖= 1….n The data are a random sample drawn from the population. Each observation follows the population equation 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 Data on workers (y=wage, x=education). Population is all workers in the U.S. (150 million) Sample is workers selected for the study (1,000) Drawing randomly from the population – each worker has equal probability of being selected For example, if young workers are oversampled, this will not be a random/representative sample. 42 Assumption 3: no perfect collinearity 𝑛𝑛 𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥 = (𝑥𝑥 − 𝑥𝑥)̅ 2 > 0 𝑖𝑖=1 In the simple regression model with one independent variable, there needs to be sample variation in the independent variable (variance of x must be positive). If there is no variation, the independent variable will be a constant and a separate coefficient cannot be estimated because there is perfect collinearity with the constant in the model. ∑(𝑥𝑥𝑖𝑖 −𝑥𝑥)(𝑦𝑦 ̅ 𝑖𝑖 −𝑦𝑦) ̂ Note that SSTx is in the denominator of 𝛽𝛽1 = ∑(𝑥𝑥𝑖𝑖 −𝑥𝑥)̅ 2 43 Assumption 4: zero conditional mean (exogeneity) 𝐸𝐸 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 ) = 0 Expected value of error term u given independent variable x is zero. The expected value of the error must not differ based on the values of the independent variable. The errors must sum up to zero for each x. 44 Example of zero conditional mean Regression model 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢 In the example of wage and education, when ability (which is unobserved and part of the error) is higher, education would also be higher. This is a violation of the zero conditional mean assumption. 45 Example of exogeneity vs endogeneity Exogeneity - zero conditional mean Endogeneity - conditional mean is not zero 10 30 25 5 uhat_modified 20 Residuals 15 0 10 -5 5 10 12 14 16 18 10 12 14 16 18 educ educ E(u|x)=0 error term is the same given education E(u|x)>0 ability/error is higher when education is higher 46 Unbiasedness of the OLS estimators Gauss Markov Assumptions 1-4 (linearity, random sampling, no perfect collinearity, and zero conditional mean) lead to the unbiasedness of the OLS estimators. 𝐸𝐸 𝛽𝛽̂0 = 𝛽𝛽0 and 𝐸𝐸 𝛽𝛽̂1 = 𝛽𝛽1 Expected values of the sample coefficients 𝛽𝛽̂ are the population parameters 𝛽𝛽. If we estimate the regression model with many random samples, the average of these coefficients will be the population parameter. For a given sample, the coefficients may be very different from the population parameters. 47 Assumption 5: homoscedasticity Homoscedasticity 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 = 𝜎𝜎 2 Variance of the error term 𝑢𝑢 must not differ with the independent variable 𝑥𝑥. Heteroscedasticity 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 ≠ 𝜎𝜎 2 is when the variance of the error term 𝑢𝑢 is not constant for each 𝑥𝑥. 48 Homoscedasticity vs heteroscedasticity Homoscedasticity Heteroscedasticity 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢 𝑥𝑥 = 𝜎𝜎 2 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢 𝑥𝑥 ≠ 𝜎𝜎 2 49 Homoscedasticity vs heteroscedasticity Homoscedasticity Heteroscedasticity 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢 𝑥𝑥 = 𝜎𝜎 2 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢 𝑥𝑥 ≠ 𝜎𝜎 2 10 15 10 5 Residuals Residuals 5 0 0 -5 -5 10 12 14 16 18 0 5 10 15 20 educ educ 50 Unbiasedness of the error variance We can estimate the variance of the error term as: 1 2 𝜎𝜎 = ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢 𝑖𝑖2 𝑛𝑛−2 The degrees of freedom (n-k-1) are corrected for the number of independent variables k=1. Gauss Markov Assumptions 1-5 (linearity, random sampling, no perfect collinearity, zero conditional mean, and homoscedasticity) lead to the unbiasedness of the error variance. 𝐸𝐸 𝜎𝜎 2 = 𝜎𝜎 2 51 Variances of the OLS estimators The estimated regression coefficients are random, because the sample is random. The coefficients will vary if a different sample is chosen. What is the sample variability in these OLS coefficients? How far are the coefficients from the population parameters? 52 Variances of the OLS estimators 𝜎𝜎 2 𝜎𝜎 2 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂1 = 𝑛𝑛 2 = ∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥 𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂0 = 𝑛𝑛 2 = ∑𝑖𝑖=1(𝑥𝑥 − 𝑥𝑥)̅ 𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥 The variances are higher if the variance of the error term is higher and if the variance in the independent variable is lower. Estimators with lower variance are desirable. This means low variance in error term but high variance in the independent variable is desirable. 53 Standard errors of the regression coefficients 𝜎𝜎 2 𝑠𝑠𝑠𝑠 𝛽𝛽̂1 = 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂1 = 𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥 𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 𝑠𝑠𝑠𝑠 𝛽𝛽̂0 = 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂0 = 𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥 The standard errors are square root of the variances. The unknown population variance of error term 𝜎𝜎 2 is replaced with the sample variance of the residuals 𝜎𝜎 2. The standard errors measure how precisely the regression coefficients are calculated. 54