Multivariate Regression PDF
Document Details
Uploaded by IntricateNewOrleans
HKBU School of Economics
Chaoqun Zhan
Tags
Summary
These lecture notes cover multivariate regression, a statistical method used to analyze the relationship between a dependent variable and multiple independent variables. The notes include discussions on inference in multivariate regression and functional form in regression. They are presented in a slide format, focused on the concepts and calculations involved.
Full Transcript
Causal Inference Lecturer: Chaoqun Zhan Lecture 5 - Multivariate Regression HKBU ECON This Lecture Recap Multivariate Regression Inference in Multivariate Regression Functional Form in Regression Review This Lecture Recap Multivariate Regression Inference in Multivariat...
Causal Inference Lecturer: Chaoqun Zhan Lecture 5 - Multivariate Regression HKBU ECON This Lecture Recap Multivariate Regression Inference in Multivariate Regression Functional Form in Regression Review This Lecture Recap Multivariate Regression Inference in Multivariate Regression Functional Form in Regression Review Recap I Bivariate regression fits a linear line between X and Y variable to best represent the relationship between the two Yi = ↵ + X i + ei I The regression line is determined by two parameters: the intersept ↵ and slope I The solution to the linear regression could be find (Ordinary Least Squares (OLS) estimation): (↵, ) = arg minE (ei2 ) ↵, I Solution is given by Cov (Yi , Xi ) = Var (Xi ) ↵ = Ȳ (X̄ ) Recap I could be interpret as marginal change of Y due to X - ”One unit change of X leads to unit change of Y” I Two conditions for OLS to hold - Cov (ei , Xi ) = 0 - E (ei ) = 0 Recap: ANOVA Theorem and R 2 I When Cov (ei , Xi ) = 0, we can have Var (Yi ) = Var (Ybi ) + Var (b ei ) = 2 Var (Xi ) + Var (b ei ) - Total sum of squared (TSS) = Var (Yi ) - Explained sum of squared (ESS) = Var (Ybi ) = 2 Var (Xi ) - Sum of squared residuals (SSR) = Var (b(ei2 ) - TSS = ESS + SSR - R2 = ESS TSS =1 SSR TSS Recap: Reparameterizing the Data I When we add a constant to X (regressor): will not change but ↵ will change - If Xi⇤ = Xi + n, then ↵⇤ = ↵ + n I When we multiple the X by a constant: will change but ↵ will not change - If Xi⇤ = nXi , then ⇤ = /n I When we multiple Y (the dependent variable) by a constant: both and ↵ will change - If Yi⇤ = nYi , then ↵⇤ = ↵/n and ⇤ = /n I When if we add a constant to Y? Yi⇤ = Yi + n Recap: Heteroskedasticity versus Homoskedasticity The homoskedastic (y1) and the heteroskedastic (y2) data q setsI Standard error of b: n1 Var Var (b e) (X ) i i q I Robust standard errors of b: 1 Var [(Xi E [Xi ])ei ] n [Var (Xi )]2 80 80 60 60 40 y1 y2 40 20 20 0 0 2 4 6 8 10 12 2 4 6 8 10 12 x x Pischke (LSE) Regression Inference September 24, 2013 18 / 24 This Lecture Recap Multivariate Regression Inference in Multivariate Regression Functional Form in Regression Review Multivariate Regression I A key assumption in bivariate regression: E (Xi ei ) = 0, or Cov (Xi , ei ) = 0 All other factors a↵ecting y are uncorrelated with x! I This is often unrealistic. I Multivariate Regression is more amenable because it allows us to explicitly control for many other factors that simultaneously a↵ect the dependent variable. I Naturally, if we add more factors to our model that are useful for explaining y, then more of the variation in y can be explained. I Thus, multiple regression analysis can be used to build better models for predicting the dependent variable. Multivariate Regression I Before we go into the econometric details, let’s first look at an example to illustrate the intuition: I A tale of two colleges. A Tale of Two Colleges I Average private university fees in 2012-13: $29,056 I Average public home state fees: $8,655 I Large cost to private university, what is the payo↵? I Counterfactual comparison: for a given graduate of Harvard, what would the person have earned had he gone to University of Massachussetts? I Observed graduates of Harvard have higher high school grades, SAT scores, and possibly other talents than those of U-Mass A Tale of Two Colleges I Find individuals who could have gone either way: private university or state school I First step: hold constant the typical di↵erences between those attending private and public university. I Consider two students, Uma and Harvey. Both have SAT scores of 1,400 - Uma attends U Mass - Harvey attends Harvard Have We Held All Relevant Factors Constant? I Is SAT score all that matters - Uma is a woman - Harvey is a man I Replace Harvey with female Harvard grad Hannah I Controls for the other factors, or holds them constant in regression Have We Held All Relevant Factors Constant? (Cont’d) I Identify everything that matters to income is hard, let along measures each of them I Stacy Berg Dale and Alan Krueger ”Estimating the Payo↵ to Attending a More Selective College: An Application of Selection on Observables and Unobservables,” Quarterly Journal of Economics, vol. 117, no. 4, November 2002, pages 1491-1527 I Summary measure for factors a↵ecting both college choice and later earnings: colleges applied to and admitted I Suppose Uma and Harvey both applied to U Mass and Harvard and were admitted but they make di↵erent choices College and Beyond Data Set I Applicants to about 30 fairly selective colleges I Survey information from the time they took SAT prior to enrollment in 1976 I Another survey in 1996 when most of them are working I Colleges include Univ. of Pennsylvania, Princeton, Yale, Swarthmore, Williams, Oberlin, and Univ. of Michigan, North Carolina, Penn State, and Miami University of Ohio I Students list up to four colleges they have applied to and the colleges’ acceptance decision The Matching Matrix Private Public Applicant Group Student Ivy Leafy Smart All State Ball State Altered State 1996 Earnings 1 Reject Admit Admit 110,000 A 2 Reject Admit Admit 100,000 3 Reject Admit Admit 110,000 4 Admit Admit Admit 60,000 B 5 Admit Admit Admit 30,000 I Students enroll in the highlighted college Matched Comparisons I Group A: Students 1 and 2 (private) earn $110,000 and $100,000; student 3 (public) earns $110,000. Average students 1 and 2 to get $105,000: private - public di↵erence: $-5,000 I Group B: Student 4 (private) earn $60,000; student 5 (public) earns $30,000 private - public di↵erence: $30,000 I Combine the two groups: - Average of $-5,000 and $30,000 is $12,500 - There are 3 students in group A but only 2 students in group B: (3/5)*$-5,000 + (2/5)*$30,000 = $9,000 More Applicant Groups Private Public Applicant Group Student Ivy Leafy Smart All State Ball State Altered State 1996 Earnings C 6 Admit 115,000 7 Admit 75,000 8 Reject Admit Admit 90,000 D 9 Reject Admit Admit 60,000 I Students enroll in the highlighted college Your Turn I What is the private-public earnings di↵erence in Group D? I A. 30,000 I B. -30,000 I C. not enough info in the table I D. the di↵erence doesn’t exist I What is the private-public earnings di↵erence in Group C? A naive simple comparison v.s a matched comparison I A simple comparison between the earnings of private and public school students of the 9 students leads to a much larger gap of $19,500 I Limiting to the 5 students in A and B, a simple comparison leads to $20,000 (20 = (110 + 100 + 60)/3 (110 + 30)/2) I Why? A naive simple comparison v.s a matched comparison I Selection Bias: students who apply to and are admitted to private schools have higher earnings wherever they ultimately chose to go. I Average earnings in group A, where two-thirds apply to private schools, are around $107,000. I Average earnings in group B, where two-thirds apply to public schools, are only $45,000. I Our within-group comparisons reveal that much of this shortfall is unrelated to students’ college attendance decisions. (compare apples with apples) I Rather, the cross-group di↵erential is explained by a combination of ambition and ability, as reflected in application decisions and the set of schools to which students were admitted. Regression: Automated Matching I Regression automates the calculation of a weighted average for matched comparison. The key ingredients in regression analysis are: - The dependent variable: Student is earnings later in life, the outcome of interest (denoted by Yi ) - The causal variable of interest: A dummy variable that indicates students who went to a private university (Pi ) - A set of control variables: Variables that identify the schools to which students applied and were admitted The Regression Model I The regression model in this case is Y i = ↵ + P i + A i + ei - Data: The variables Yi , Pi , and Ai for i = 1, 2, 3, 4, and 5 I Ai is a dummy variable for applicant group A - Parameters we are estimating: I The intercept, ↵ (alpha) I The e↵ect of interest, (beta) I The e↵ect of being a Group A student, (gamma) - The residual ei Regression Results ↵ = 40, 000 = 10, 000 = 60, 000 I The estimate of the private school e↵ect is 10,000, neither the raw matching estimate of 12,500 nor the weighted matching average of 9,000 I In this example, regression assigns a weight of 3/7 to Group A and 4/7 to Group B I You can refer to details in the textbook MHE The Dale-Krueger Regression J X lnYi = ↵ + Pi + j Groupji + 1 SATi + 2 lnPIi + ei j=1 I Form groups according to Barrons categories: Most Competitive, Highly Competitive, Very Competitive, Competitive, Less Competitive, and Noncompetitive. Start with data on 14,000 students. 5,583 students applied both private and public schools within 151 Barron’s groups I Replace the group B dummy Bi with a set of 150 group dummies PJ j=1 j Groupji (Groupji = 1 whenever student i is in group j) I Add control variables for SAT score SATi and parental income lnPIi (and a few others) I Instead of earnings, use ln(earnings) lnYi. Coefficients will have a percent interpretation Log Transformation of Income Variable I Why do we want to use log transformation on income - Distribution of income is usually left skewed - Give lower weighted to outerlier - Percentage interpretation of coefficients Model Dependent Independent Interpretation of Variable Variable Level-level Regression y x y = x y =↵+ x +e ”If you change x by one (unit), we’d ex- pect y to change by ” Log-level Regression ln(y) x % y = 100 ⇤ x ln(y ) = ↵ + x + e ”If you change x by one (unit), we’d ex- pect y to change by 100* percent” Level-Log Regression y ln(x) % y = ( /100)% x y = ↵ + ln(x) + e ”If you increase x by one percent, we’d expect y to change by /100 percent” Log-Log Regression ln(y) ln(x) % y = % x ln(y ) = ↵ + ln(x) + e ”If you increase x by one percent, we’d expect y to change by percent” Dale-Krueger Dale-Krueger results Results: 5,583 Observations 5,583 observations Pischke (LSE) Matching September 24, 2013 16 / 21 Your Turn I What is the coefficient of own SAT score on log earnings controlling for private school and Barron’s groups? I A. 0.048 I B. 0.016 I C. 0.033 I D. 0.001 A Acoarser Coarsermatching Matchingstrategy Strategy: 14,238 Observations 14,238 observations Pischke (LSE) Matching September 24, 2013 18 / 21 The The Nature nature of of Matching matching and and Regression regression Pischke (LSE) Matching September 24, 2013 19 / 21 Regression is a More Flexible Framework Than Matching I Regression works for continuous controls (like avg. SAT score of schools applied to) I Regression works for multi-valued and continuous treatments (like selectivity of school attended) I Regression works for multiple treatments at a time I Regression is standard among empirical economists I Regression easily done in standard statistical software Matching and Regression: Summary I Matching and regression deal with confounders a↵ecting both treatment and outcome I Basic idea: control for (hold constant) the confounder I Matching: treatment di↵erences within groups matched on control, then average up I Regression automates the averaging I Regression o↵ers more flexibility I But regression itself just produces partial correlations Multivariate Regression: I Matching and regression deal with confounders a↵ecting both treatment and outcome I There are two (types of) variables on the right hand side of the regression: - the regressor of interest (the treatment; here class size, the student-teacher ratio STRi - control variables to control for confounders (Xi ) Yi = ↵ + STRi + Xi + ei I Regression with more than one right hand side variable is called multivariate regression Key Question: What controls to be included? I Considering evaluating the impact of university education on earnings. I The di↵erence between university graduates earnings and high school graduates earnings? I What is causal e↵ect? I What factors should be considered to estimate the causal more accurately? I in regressions, what controls to be included. (match) Your turn: Considering evaluating the impact of university education on earnings, what factors should be considered to estimate the causal more accurately? I A. Average high school scores; I B. Parental Income I C. Hometown I D. Breakfast Habit What you choose and why? The Criteria I We may need to think hard on how each candidate fits into the relationship between university education and earnings I Logically demanding I We provide a framework to simply the thinking process. I The omitted variable bias formula. I Note that although we illustrate using regressions, this framework also helps organize your thinking in considering issues in real life. Long and Short Regressions I The relationship between the long regression Y i = ↵ + Si + Xi + ei I and the short regression we ran before Yi = ↵ s + s Si + eis I is governed by the omitted variables bias formular The Omitted Variables Bias Formula I The relationship between the long regression and short regression coefficients is given by the omitted variables bias (OVB) formula s Cov (Yi , Si ) = = + XS Var (Si ) I where Cov (Xi , Si ) XS = Var (Si ) I is the regression coefficient from a regression of Xi (the omitted variable) on Si (the included variable) Deriving the OVB formula I To derive the OVB formula, substitute the long regression equation into the definition of the short regression coefficient: s Cov (Yi , Si ) = Var (Si ) Cov (↵ + Si + Xi + ei , Si ) = Var (Si ) Var (Si ) + Cov (Xi , Si ) + Cov (ei , Si ) = Var (Si ) Cov (Xi , Si ) = + Var (Si ) = + XS Exam * Using the OVB formula in the Class Size Example y : Salary Under correlationship Si : Uni Student 4 : and X : and B : Potential Yi = ↵ + Si + X i + ei (long ) Situation Yi = ↵ S + S Si + eiS (short) Under not correlated s = + XS - in controlled outcome I We saw that more disadvantaged students (English learners, students on free meals, or from households receiving public assistance) tend to be in districts with bigger classes ( XS > 0) I We expect disadvantaged students to have lower test scores ( < 0) I Then, what is the relationship between the s and Your Turn I Consider the class size regression test scorei = ↵ + class sizei + pct. English learnersi + ei I The short regression coefficient on class size will be - A. too small - B. too big I compared to the long regression coefficient controlling for English learners OVB in action I To see the omitted variables bias formular in action, recall the short regression test scorei = 689.9 2.28class sizei + ei I The long regression is test scorei = 686.0 1.10class sizei 0.65pct. English learneri + ei I The auxiliary regression of English learners on class size is pct. English learneri = 19.85 + 1.81class sizei + ei OVB in the Class Size Example I Plugging the values from these regressions into the OVB formula we get: s = + XS = 1.10 0.65(1.81) = 1.10 1.18 = 2.28 I which is indeed the coefficient in the short regression What about icecream and drowning? What about Icecream and Drowning? Pischke (LSE) Multivariate Regression November 11, 2013 17 / 18 Applications of OVB: Summary I What factors to be considered? I If an important factor is not measurable, how will that a↵ect our estimation? The Regression Anatomy Formula I It is often more convenient to work with bivariate regressions. We can always turn a multiple regression into bivariate regressions using the regression anatomy formular Yi = ↵ + 1 X1i + 2 X2i + ei I Then Cov (Yi , Xf 1i ) 1 = f Var (X1i ) I where Xf 1i is the residual from a regression of X1i on X2i The Regression Anatomy Formula Derived I It is often more convenient to work with bivariate regressions. We can always turn a multiple regression into bivariate regressions using the regression anatomy formular Cov (Yi , Xf 1i ) Cov (↵ + 1 X1i+ 2 X2i + ei , Xf 1i ) = f Var (X1i ) f Var (X1i ) f + 1 Cov (X1i , X1i ) f + Cov (ei , Xf 2 Cov (X2 , X1i ) 1i ) = Var (Xf 1i ) I Residuals are uncorrelated with included regressor so Cov (ei , Xf f 1i ) = 0 and Cov (X2i , X1i ) = 0 The Regression Anatomy Formula Derived I because Xf 1i is a function of X2i and ei is the long regression residual, and Xf 1i is the residual from a regression on X2i. Finally Cov (X1i , Xf f 1i ) = Var (X1i ) I by the regression ANOVA theorem. Hence Cov (Yi , Xf 1i ) Var (Xf 1i ) = 1 f Var (X1i ) f Var (X1i ) Alternative Versions of Regression Anatomy I The process of first regressing X1i on other covariates and then running a bivariate regression is sometimes called partialling out covariates I Regression anatomy can also be written as Cov (Yi , Xf 1i ) Cov (Yei , Xf 1i ) 1 = = f Var (X1i ) f Var (X1i ) I i.e. also partialling out X2i from the dependent variable I It is not enough to partial out the covariate from the dependent variable along, you must partial it out from X1i I Regression anatomy works for multiple covariates just as well Regression anatomy is useful to plot your data ResidualsRegression Anatomy removing percent English islearners Useful to Plot Your Data Residuals Removing Percent English Learners 700 680 Test score 640 660 620 600 15 20 25 Student-teacher ratio Raw data Residuals Fitted values Fitted values Pischke (LSE) Multivariate Regression November 11, 2013 15 / 18 Adding Addingdistrict Districtcomposition Compositionvariables Variables Regressor (1) (2) (3) (4) (5) Student-teacher 2.28 1.10 1.12 2.17 1.01 ratio (0.52) (0.43) (0.27) (0.38) (0.27) Percent 0.65 0.13 English learners (0.03) (0.04) Percent 0.60 0.53 free lunch (0.02) (0.04) Percent 1.04 0.05 public asst. (0.08) (0.06) R2 0.051 0.426 0.767 0.439 0.775 Pischke (LSE) Multivariate Regression November 11, 2013 16 / 18 8 Alternative Versions of Regression Anatomy I The coefficient on your regressor of interest has a casual interpretation if this regressor is as good as randomly assigned conditional on other included controls I The omitted variables bias formula links the coefficients in short and long regressions. It gives us a guide to possible bias if we cant include a necessary control - Excluding uncorrelated controls from the regression is innocuous I The regression anatomy formula allows us to write multiple regression coefficients as bivariate regression coeefficients. Useful to plot multivariate data This Lecture Recap Multivariate Regression Inference in Multivariate Regression Functional Form in Regression Review Today I Multivariate Regression I Inference in Multivariate Regression I Functional Form in Regression Standard Errors in Multivariate Regression I So your regression is K X Yi = ↵ + k Xki + ei k=1 I The standard error for a bivariate regression is s 1 Var (b ei ) SE ( b) = n Var (Xi ) I For the multivariate regression, from regression anatomy s 1 Var (b ei ) SE (ck ) = fki ) n Var (X I Var (X fki ) is the variance of X fki , the residual from a regression of Xki on all other regressors. Making Things More Concrete I How does the standard error change going from a short to long regression? Yi = ↵ s + s X1i + eis (short) Yi = ↵ + X1i + X2i + ei (long ) I The standard error for the short regression is s c s 1 Var (eis ) SE ( ) = n Var (X1i ) I The standard error for the long regression is s b 1 Var (bei ) SE ( ) = n Var (Xf1i ) Comparing the Short and Long Regression Standard Errors I Comparing s s 1 Var (eis ) 1 Var (bei ) SE (cs ) = and SE ( b) = n Var (X1i ) n Var (Xf1i ) I Two thing change: - The variance of the residual ei - The variance of the regressor X1i Adding How dodistrict Standardcomposition variables Errors Change in the Class Size Regressions? Regressor (1) (2) (3) (4) (5) Student-teacher 2.28 1.10 1.12 2.17 1.01 ratio (0.52) (0.43) (0.27) (0.38) (0.27) Percent 0.65 0.13 English learners (0.03) (0.04) Percent 0.60 0.53 free lunch (0.02) (0.04) Percent 1.04 0.05 public asst. (0.08) (0.06) R2 0.051 0.426 0.767 0.439 0.775 Pischke (LSE) Multivariate Regression November 11, 2013 16 / 18 Regression Output for Multivariate Regression Inference and Testing in Multivariate Regression I Standard errors and t-tests for a single coefficient are just analogous to the bivariate regression case I New testing problems arise in multivariate regression: - Testing single hypotheses involving multiple coefficients - Testing multiple hypotheses at the same time Tests Involving Multiple Coefficients I Consider the regression: testscorei = ↵+ classsizei + 1 pct.Englishlearners+ 2 pct.freelunch+ei I We may be interested in the hypothesis that having more English learners or more students on free lunches has the same impact on test scores, or in the hypothesis H0 : 1 = 2 versus H1 : 1 6= 2 I Note that the null hypothesis can be written as 1 2 = 0 so that b1 b2 t= SE ( b1 b2 ) The Two Coefficient T-tests I We need to find SE ( b1 b2 ), now Var ( b1 b2 ) = Var ( b1 ) + Var ( b2 ) 2Cov ( b1 , b2 ) I You can find Cov ( b1 , b2 ) just like you can find the sampling variance for a single coefficient. The t-statistic is simply b1 , b2 t=p Var ( b1 ) + Var ( b2 ) 2Cov ( b1 , b2 ) An Alternative: Transforming the Regression I We can obtain this t-statistic in another way. Define a new parameter ✓ = 1 2 I This can be written as 1 =✓+ 2. Use this to replace 1 in the regression equation 2/ = : V Ho , Yi = ↵ + Si + + 1 X1i + F + 2 2 X2i + ei V = , = ↵ + Si + +(✓ + 2 )X1i + 2 X2i + ei = ↵ + Si + +✓X1i + 2 (X1i + X2i ) + ei Yi = 2 + BSi + ViXi + Vexzi + e; + 10 + 22)r I The t-statistic we want is therefore = 11 ✓b t= b SE (✓) I applied to the transformed regression Your Turn I Suppose you run Yi = ↵ + 1 X1i + 2 X2i + ei I and you want to test 1 + 2 =0 I What is your transformed regression? -O A. Yi = ↵ + ( 1 + ✓)X1i + 2 (X2i X1i ) + ei - B. Yi = ↵ + ✓(X1i + X2i ) + 2 X2i + 3 X3i + ei - C. Yi = ↵ + ✓X1i + 2 (X2i X1i ) + ei - D. The regression cannot be transformed Multiple Hypotheses at once I In multivariate regression it is sometimes interesting to test multiple hypotheses at once. Consider again the regression testscorei = ↵+ classsizei + 1 pct.Englishlearners+ 2 pct.freelunch+ei I We may be interested in the hypothesis that neither the fraction of English learners nor the fraction of free lunch students has any impact on test scores, or: H0 : 1 = 0, 2 = 0 versus H1 : 1 6= 0 or 2 6= 0 I We could of course use two simple t-tests for each hypothesis that 1 = 0 and 2 = 0. But we may want to know whether both are true at once Testing Joint Hypotheses I To test a joint hypothesis, we cannot just combine the single t-statistics. There are two reasons for this: - As before, the estimated coefficients b1 and b2 will in general be correlated. We need to take this correlation into account - Even if this correlation is zero, rejecting the joint hypothesis if either one of the two t-tests rejects would reject too often under the null hypothesis. Suppose t1 and t2 are your two t-statistics. You do not reject if Pr (|t1 | 1.96 and |t2 | 1.96) = Pr (|t1 | 1.96)Pr (|t2 | 1.96) = 0.952 = 0.9025 - This means we are rejecting 9.75% of the time, rather than 5% of the time if the null hypothesis is true The F-test I In order to test a joint hypothesis, we need to perform an F-test. The F-statistic for the hypothesis 1 = 0, 2 = 0 has the form 1 t12 + t22 2⇢t1 t2 t1 t2 F = ( ) 2 1 ⇢2t1 t2 I where ⇢t1 t2 is the correlation of the two t-statistics. Note, if ⇢t1 t2 - is zero, we just want to add the two t-statistics - is high, we ant to deduct something because of one t-test rejects under the null, then the second test is more likely to reject as well I We compare the F-statistic to a 2 (2) distribution because our test involves 2 restrictions. Using the appropriate distribution adjusts the rejection region, so we do not reject too often under the null The F-test and the T-test I Notice that the F-statistic for a single hypothesis is just F = t2 I and has a 2 (1) distribution under the null. You can always do an F-test instead of a t-test (but not vice versa) The Overall F-test I A particular hypothesis about the regression Yi = ↵ + 1 X1i + 2 X2i +... + K XKi + ei I is the one that none of the regressors explains any variation in Yi or H0 : 1 = 0, 2 = 0,..., K == 0 I The alternative hypothesis is that at least one of the -coefficients is non-zero Application: California Test Score Data Testing Equality of Two Coefficients California Test Score Data Testing Equality of Coefficients by Transforming the Data I We can test this as ✓= 1 2 =0 I by running the regression Yi = ↵ + Si + ✓X1i + 2 (X1i + X2i ) + ei California Test Score Data Testing Equality of Coefficients by Transforming the Data California Test Score Data Both Ways of Testing Equality of Coefficients are the Same I The F-statistic from testing the restriction 1 = 2 directly was 75.18 I The t-statistic for ✓, the coefficient on pct. English learners, is 8.671. 8.6712 = 75.18624 California Test Score Data Testing a Joint Hypothesis Testing the Equality of Coefficients in Subsamples: The Chow Test I Sometimes we want to know whether the parameters of our equation are the same in two di↵erent subsamples (e.g. high and low income districts). Call the samples A and B, so the seperate regressions in teh two samples are YiA = ↵A + A A A A A A A 1 X1i + 2 X2i +... + K XKi + ei YiB = ↵B + B B B B B B 1 X1i + 2 X2i +... + K XKi + ei B I We would like to test H0 : ↵ A = ↵ B , A 1 = B 1 , A 2 = B 2 ,..., A K = B K, Testing the Equality of Coefficients in SubsamplesThe Chow Test I Transform the regression again using ✓K = kB A k and ✓0 = ↵0B ↵1A and combine both samples to give Yi =↵A + A A 1 X1i + A A 2 X2i +... + A A K XKi + ✓0 Bi + ✓1 X1iB + ✓2 X2iB +... + ✓K XKi B + ei I Notice that the first row refers to all observations while the second row refers to the B sample only/ I If we define an indicator for the B sample as: ⇢ 1 if from subsample B Bi = 0 if from subsample A I This means you can run Yi =↵A + A 1 X1i + A 2 X2i +... + A K XKi + ✓0 Bi + ✓1 (Bi ⇤ X1i ) + ✓2 (Bi ⇤ X2i ) +... + ✓K (Bi ⇤ XKi ) + ei I More on such interactions in next week on the session ”dummy variable” California Test Score DataTesting Parameter Stability Inference in Multivariate Regression: Summary I In multivariate regressions we can perform: - Simple t-tests for single coefficients - t- or F-tests for a single hypothesis involving multiple coefficients: F = t2 - F-tests for joint hypotheses This Lecture Recap Multivariate Regression Inference in Multivariate Regression Functional Form in Regression Review Linear Regression is an Approximation to the CEF in the CA School The Regression of Test Scores on District Income An Important Aside I The big picture: we are interested in the causal e↵ect of class size on test scores - District income is a control variable in this regression. We are not particularly interested in the coefficient on income I In this subsection: we will talk about the magnitude of the income coefficients - To illustrate how to interpret coefficients from a non-linear regression function I Sometimes we go back to our bigger picture question: how to control for income in the class size regression? Other forms of the regression line I It is easy to augment the simple regression model to fit the income data better, for example, by estimating 2 test scorei = ↵ + 1 incomei + 2 incomei + ei The linear versus the Quadratic Specification for Income What’s Linear about Linear Regression? I OLS regression is often called linear regression. So what’s linear about linear regression? - The OLS estimator is a linear estimator (a linear function of the data) I A lot of useful properties (e.g. uncorrelated residuals, the ANOVA theorem) result from this linearity - The regression function is linear in the parameters ↵, 1, 2 ,... I We can’t estimate a regression like this by OLS: Yi = ↵Ki Li + ei - The regression function can be non-linear in the regressors I We can still estimate a nonlinear relationship between test scores and income, for example, by including the square of income How to Interpret Non-linear Regression Functions? I With a simple linear regression function, interpretation is easy: test scorei = ↵ + incomei + ei I In this case is the e↵ect of a $1,000 increase in average income on test scores I In the quadratic specification, its a bit more tricky: 2 test scorei = ↵ + incomei + 2 incomei + ei I the e↵ect of a $1,000 increase in average income on test scores is now @test score = 1 + 2 2 income @income I so the e↵ect depends on the level of income you look at The Log Specification for Income Interpreting the Log Specification I The simple log specification for income seems to work extremely well in this example, and often does for similar variables I The log specification test scorei = ↵ + ln(incomei ) + ei I implies @test score = @income income @test score = @income/income I or is the e↵ect of a relative change in income Interpreting the Log Specification I Proportional changes of income are often more reasonable than additive changes: - A $1,000 change is pretty big for a district with income of $15,000 (in terms of economic impact) - A $1,000 change is much smaller for a district with income of $40,000 - Comparing a 10% change may be a more similar exercise. This is what the log specification does A Little Bit about Logs I The log has a lot of useful properties: ln(ax) = ln(a) + ln(x) ln(x/a) = ln(x) ln(a) a ln(x ) = aln(x) Your Turn I In our class size regression we have been using income measured in $1,000 test scorei = ↵ + ln(incomei /1, 000) + ei I Suppose someone else runs the regression just with raw income: test scorei = ↵⇤ + ⇤ ln(incomei ) + ei⇤ I How do the estimates of the slope coefficients compare ( versus ⇤ )? - A. They are the same ⇤ - B. = ⇤ 1000 ⇤ - C. = /1000 ⇤ - D. = + 1000 ⇤ - E. = 1000 I How about the intersept (↵⇤ )? Don’t Worry about Scaling Your Variable in Logs I Compare two regressions using ln(Xi ) and ln(AXi ): Yi = ↵ + ln(Xi ) + ei Yi = ↵ ⇤ + ⇤ ln(AXi ) + ei⇤ ⇤ ⇤ Yi = ↵ ln(A) + ⇤ ln(Xi ) + ei⇤ | {z } |{z} ↵ I The slope coefficient on Xi is the same Logs for the Production Function I The properties of the log let us estimate the production function relationship conveniently. Yi = ↵Ki Li I Take logs of all variables to get the regression equation ln(Yi ) = ln(↵) + ln(Ki ) + ln(Li ) + ei Log Derivatives I Moreover we just saw that @x @ln(x) = x I From this it follows that @ln(y ) @y /y @y x = = @ln(x) @x/x @x y I which is an elasticity The log-log Regression I This means that the log-log regression gives you an elasticity: ln(quantityi ) = ↵ + ln(pricei ) + ei I In this case estimates directly an elasticity I Of course the estimate of in the above regression may be both an elastiticy and entirely silly. Silly because the estimated regression may be neither a supply nor a demand function, so may not have any economic interpretation at all Controlling for Controlling for income Income in in the the Test Score data test score Data Regressor (1) (2) (3) (4) 2.28 0.65 0.91 0.88 Student-teacher ratio (0.52) (0.35) (0.35) (0.34) 1.84 3.88 Avg. income ($1, 000) (0.11) (0.27) 0.044 Avg. income2 ($1, 000) (0.005) 35.62 ln(avg. income) (1.40) R2 0.051 0.512 0.564 0.570 Pischke (LSE) Functional Form November 18, 2013 18 / 26 Making Sense of the Income Results I In the simple linear regression we got a coefficient on income of 1.84 I For the quadratic regression we get @test score = 1 + 2 2 income @income = 3.88 + 2 ⇤ ( 0.044)income = 2.53 ⇤ at mean income) I The quadratic regression evaluated at the mean produces a higher derivative than the linear regression in this case Average Slopes of the Testscore - Income Relationship Making Sense of the Income Results I Coefficient in simple linear regression: 1.84 I Derivative at the mean for quadratic regression: 2.53 I For the log specification @test score @ln(income) 1 1 = = = 35.62 ⇤ = 2.33 @income @income income 15.32 I we get a similar result to the quadratic one Interested in slope around a particular point? use data around that Interested in slope around a particular point? use data around that When Functional Form Matter I The relation between avg. income and test score reverses for district reaches to income $40,000 I Suggests fitting a non-linear functional form makes sense When Functional Form Matter (Cont’d) I y = ax 2 + bx + c When does functional form matter? (4ac b 2 WeIarexpeak b = 2ain, the interested ypeak = of (4a) e§ect the student-teacher ratio I 3.88/(2 ⇤ 0.044) ⇤ 1000 = 44090.91 Regressor (1) (2) (3) (4) 2.28 0.65 0.91 0.88 Student-teacher ratio (0.52) (0.35) (0.35) (0.34) 1.84 3.88 Avg. income ($1, 000) (0.11) (0.27) 0.044 Avg. income2 ($1, 000) (0.005) 35.62 ln(avg. income) (1.40) R2 0.051 0.512 0.564 0.570 Functional Form in Regression: Summary I Some specification issues in regression analysis are more important than others I Don’t need to worry too much about functional form of control variables I For our regressors of interest: Linear approximations to non-linear regression functions have an interpretation - But ask yourself in what part of the sample do you want to interpret results? I Functional form matters for some other issues we haven’t talked about so far: - prediction - it helps with standard errors This Lecture Recap Multivariate Regression Inference in Multivariate Regression Functional Form in Regression Review To Review I Multivariate Regression - SW Chapter 6, AP Appendix to Chapter 2 I Inference in Multivariate Regression - SW Chapter 7, AP Appendix to Chapter 2 I Functional Form in Regression - SW Chapter 8.1 and 8.2 Thank you! See you next week!