Multiple Regression Analysis (I) Notes PDF

Summary

These notes provide an overview of multiple regression analysis, including general linear regression models, different types of models (first-order, interaction, second-order, polynomial), matrix representations, estimation, inferences, ANOVA, and more.

Full Transcript

Multiple Regression Analysis (I)            General Multiple Linear Regression Model Linear Regression Model in Matrix Terms Estimation and Inferences in Matrix Terms ANOVA Results R2 and Adjusted R2 Indicator (Dummy) Variables Partial F-test Beta (Standardized) Coefficients Coefficients...

Multiple Regression Analysis (I)            General Multiple Linear Regression Model Linear Regression Model in Matrix Terms Estimation and Inferences in Matrix Terms ANOVA Results R2 and Adjusted R2 Indicator (Dummy) Variables Partial F-test Beta (Standardized) Coefficients Coefficients of Partial Determination Multicollinearity and Its Effects Polynomial Regression Models 1 General Multiple Linear Regression Model  In most research problems where regression analysis is applied, more than one independent variable is needed in the regression model. When this model is linear in the coefficients, it is called a multiple linear regression model.  The fundamental principles of simple linear regression model can be extended to the multiple regression model. The model can be written as follows, called the general linear regression model, Yi = 0 + 1 Xi1 + 2 Xi2 + ... + k Xik + i , where Xij (j = 1, 2, ..., k & i =1, 2, ..., n) are independent variables, j (j = 0, 1, 2, ..., k) are parameters (partial coefficients), i (i =1, 2, ..., n) are independent N(0, 2). That implies that Yi ~ N(0 + 1 Xi1 + 2 Xi2 + ... + k Xik , 2) and independent. 2 Various Examples of General Linear Regression Model  First-order Model When X1, …, Xk represent k different predictor variables, the general linear regression model is called a first-order model in which there are no interaction effects between the predictor variables.  Interaction Model For instance, Yi = 0 + 1Xi1 + 2 Xi2 + 3 Xi1Xi2 + i  Second-order Model A second-order model, e.g. for two independent variables case, is defined as follows Yi = 0 + 1Xi1 + 2 Xi2 + 3 Xi1Xi2 + 4Xi12 + 5Xi22 + i  Polynomial Model Yi = 0 + 1Xi + 2 Xi2 + … + mXim + i 3 General Linear Regression Model in Matrix Terms  Yi = 0 + 1 Xi1 + 2 Xi2 + ... + k Xik + i , i = 1, 2, …, n Let Yn1 = (Y1, Y2 , …, Yn)’, Xn2 = [1n1, X1, X2, … Xn]n(k+1), (k+1)1 = (0 , 1 , …, k )’ and n1 = (1, 2 , …, n)’ . Then the general linear regression model in matrix terms is Yn1 = Xn(k+1) (k+1)1 + n1 or simply Y = X +  where  is a vector of independent normal variables with E( ) = 0 and Var() = 2 I. Consequently, Y is a vector of independent normal variables with E(Y) = X and Var(Y) = 2 I. 4 Estimation in Matrix Terms  The Least Squares Normal Equations X’Xb = X’Y where b = (b0, b1, b2 , … , bk)’.  Estimated Regression Coefficients LSE and MLE: b = (X’X)-1 X’Y  Properties of the Estimators They are minimum variance unbiased, consistent, and sufficient.  Fitted Values  Residuals  Variance-Covariance Matrix Var(e) = 2 (I - H) and is estimated by s2(e) = MSE (I -H) 5 Inferences in Matrix Terms  The variance covariance matrix Var(b) = 2 (X’X)-1 The estimated variance-covariance matrix of b is s2(b) = MSE (X’X)-1 = s2(X’X)-1  Inferences    bi is normally distributed random variable for the normal model. The (1 - ) 100% Confidence interval for i bi - t/2 s(bi) < i < b0 + t/2 s(bi) where t/2 is a value of the t - distribution with df = (n -k-1). Hypothesis test of i To test the null hypothesis H0: i = 0 against a alternative Ha, we may use the test statistic t = bi/s(bi) 6 Inferences in Matrix Terms  Mean Response Let Xh = (1, xh1, xh2 … xhk)’ Var( ) = 2 Xh’(X’X)-1Xh The estimated variance of in matrix notation is s2( ) = MSE(Xh’(X’X)-1 Xh)  The (1 - ) 100% confidence interval for the mean response E(Yi ) is as follows - t/2, (n-k-1) s( ) < E(Yh) < + t/2, (n-k-1) s( )  Prediction of New Observation s2(pred) = MSE(1+Xh’(X’X)-1Xh) 7 ANOVA where SST = Y’(I - J/n)Y, SSE = Y’(I - H)Y & SSR = Y’(H - J/n)Y E(MSE) = 2 and E(MSR) is 2 plus a nonnegative quantity, e.g. E(MSR) =2+[12(Xi1-X1)2+22(Xi2-X2)2+212(Xi2-X2)(Xi1-X1)]/2  The F-test associated with the ANOVA table is a test of the null hypothesis that H0: 1 = 2 = ... = k = 0. Ha: One or more of the i values are not equal to zero. In other words, it is a test of whether there is a linear relationship between the dependent variable Y and the entire set of independent variables Xi (i = 1, 2, ... k). 8 Dividend Example (cont.) A random sample of 42 firms was chosen from the S&P500 firms listed in the Spring 2003 Special Issue of Business Week (The Business Week Fifty Best Performers). The dividend yield (DIVYIELD) and the 2002 earnings per share (EPS), and the stock price (PRICE) were recorded for the 42 firms. These data are in a file named DIV4. Using dividend yield as the DV and EPS and PRICE as the IVs, run a regression using SPSS. (a)What is the sample regression equation? (b) What conclusion(s) can be drawn based on the outputs? (c ) Is it necessary to test each coefficient individually to see if either PRICE or EPS is related to DIVYIELD? Why or why not? 9 Dividend Example (cont.) ANOVAb Model 1 Sum of Squares Regression df Mean Square 12.677 2 6.338 Residual 132.532 39 3.398 Total 145.208 41 F Sig. 1.865 .168a a. Predictors: (Constant), price, eps b. Dependent Variable: divyield Coefficientsa Standardized Coefficients Unstandardized Coefficients Model 1 B (Constant) EPS PRICE Std. Error Beta 2.450 .653 .604 .314 -.029 .026 t Sig. 3.753 .001 .387 1.925 .062 -.227 -1.129 .266 a. Dependent Variable: divyield 10 Example 1  A human resources manager is interested in developing a multiple regression model to estimate the salary Y (in thousands of dollars) for employees from experience X1 (in years) with the firm and from performance X2 (as measured by an index). Data were collected for 15 employees and are presented in the following table: 11 Example 1 (cont.) (a)Find the estimated multiple regression equation for Y regressed on X1 and X2 . (b)Find the predicted value for Y given X1 = 10 and X2 = 60. (c) Find s. (d)Test the overall significance of the regression relationship. (e) Test each independent variable separately to see whether it contributes explanatory power to the regression equation. Use  = 5%. 12 Example 1- Solution 13 Example 1- Solution (cont.) (a) = 8.49 + 2.778 X1 + 0.0656 X2 (b) If X1 = 10 and X2 = 60 the predicted value for Y is = 8.49 + 2.778 (10) + 0.0656 (60) = 40.25 (c) s = (15.163)1/2 = 3.894 (d) H0: 1 = 2 = 0 vs Ha: H0 is not true. From the ANOVA table, p-value = 0.008 is very small, we reject H0 and conclude that there is a linear relationship among the salary, experience and performance. (e) H0i: i = 0 vs Hai: i  0 i = 1, 2 From the coefficients table, p-values for 1 and 2 are 0.103 and 0.788 respectively, we do not reject the null hypotheses at the 5% level. 14 Coefficient of Determination - R2  The definition of the coefficient of determination R2 for multiple regression analysis is the same as the simple regression analysis. That is, R2 is the proportion of the total variation of Y that is explained by the relationship between Y and independent variables X’s. It is an important summary statistic that is used to help evaluate how well the multiple regression model fits the data. The equation for R2 is as follows R2 = SSR/SST = 1 - SSE/SST .  The R2 value will generally increase as more independent variables are included in a multiple regression equation, given a fixed number of observations. 15 Why Do We Need to Consider Ra2 ?  The reason (R2 value will increase) is that as additional independent variables X’s are included in a regression equation, the value of SST does not change, but SSR generally increases, equivalently SSE decreases, therefore R2 generally increases.  The additional independent variables may not contribute significantly to the explanation of the dependent variable y, but they do increase R2 . Adding more independent variables in the regression equation for the purpose of increasing R2 often results in overfitting and result in worse models rather than better ones.  To help prevent overfitting in regression analysis, we use the so called Adjusted R Square (written as Ra2) value as the measure of how well the model fits the data. 16 Adjusted R Square - Ra2  The Ra2 value incorporates the effect of including additional independent variables in a multiple regression equation. This value is computed by the following formula: where k is the number of independent variables and n is the size of the sample. Ra2 will always be smaller than R2.  Unlike R2, Ra2 takes into account (‘adjusts for’) both the sample size n and the number of independent variables k in the model. It may actually become smaller when another X variable is introduced into the model, because any decrease in SSE may be more than offset by the loss of a degree of freedom in the denominator n - k -1. 17 Example 1 (cont.) Find the Ra2 and interpret its value. Solution R2 = SSR/SST = 228.447/410.4 = 0.557 Ra2 = 1 - (181.95/410.4)(14/12) = 0.483 The value of the Ra2 can be interpreted in the following way: Ra2 = 0.4828 means that approximately 48.3% of the total variation in the values of Y (salary) can be explained by a linear relationship with independent variables after adjusting the number of independent variables (experience and performance). 18 Example 1 (cont.) Model Summary Model 1 R .746a R Square Adjusted R Square .557 Std. Error of the Estimate .483 =8.49+2.778 X1+0.0656 X2 3.89394 a. Predictors: (Constant), performa, experien Model Summary = 8.959 + 3.148 X1 Model 1 R R Square .744a .554 Adjusted R Square .520 Std. Error of the Estimate 3.75291 a. Predictors: (Constant), experien 19 Indicator (Dummy) Variables  There are many occasions in which qualitative (categorical) variables need to be considered as a part of the model development. Examples of qualitative independent variables and some possible categories are sex - male or female; marital status married or not married, and so on.  If qualitative variables are to be included in a regression model, they must be quantified, that is, they must be assigned numerical values. Quantification can be accomplished by using indicator (dummy) variables. Indicator variables are assigned the values 0 or 1, for example, 20 Dummy Variables (Cont.)  When the qualitative variable had c categories, we use (c - 1) indicator variables. Say, in a study, there are 4 age groups, 0-10, 11-20, 21-40, and 40+, so we need (c - 1) = (4 - 1) = 3 indicator variables. In fact,  The reason why we only need three dummy variables is that the category “40+” is treated as the “default group”, or the “otherwise group”. 21 Example 2  A female executive at a certain company claims that male executives earn higher salaries, on average, than female executives with the same education, experience, and responsibilities. To support her claim, she wants to model the salary y of an executive using a qualitative independent variable representing the gender of an executive (male or female). (a) Write a model for mean executive salary, E(y), using a dummy variable for the gender of an executive. (b) Interpret the  parameters in the model. 22 Example 2 (cont) (a) The model for executive salary is Y = 0 + 1 X +  The mean salary is E(Y) = 0+1X (b) The advantage of using a 0-1 coding scheme is that the  coefficients are easily interpreted. if X = 1 (male) M =E(Y) = 0+ 1(1) = 0+1 F = E(Y) = 0 + 1(0) = 0 if X = 0 (female), then 1 = M - F . That is, 0 represents the mean salary for females, and 1 represents the difference between the mean salary for males and the mean salary for females. Therefore, when a 0-1 coding convention is used, 0 will always represent the mean response associated with the level of the qualitative variable assigned the value 0 (called the base level), and 1 will always represent the difference between the mean response for the level assigned the value 1 and the mean for the base level. 23 Example -Employment Discrimination Data for the following variables for 93 employees of Harris Bank Chicago in 1977 are available: Y = beginning salaries in dollars (SALARY) X1= years of schooling at the time of hire (EDUCAT) X2= number of months of previous work experience (EXPER) X3= number of months after January 1, 1969, that the individual was hired (MONTHS)) X4 = indicator variable coded 1 for males and 0 for females (MALE) (a) Is there evidence that Harris Bank discriminated against female employees? (b) What salary would you forecast, on average, for males with 12 years education, 10 years of experience, and with hired equal to 15? What salary would you forecast, on average, for females if all other factors are equal? 24 Employment Discrimination (cont.) ANOVAb Model 1 Regression Sum of Squares 2.367E7 df 4 Mean Square 5916337.848 Residual 2.266E7 88 257476.579 Total 4.632E7 92 F 22.978 Sig. .000a T Sig. 10.760 .000 a. Predictors: (Constant), males, exper, months, educat b. Dependent Variable: salary Coefficientsa Unstandardized Coefficients Model 1 Standardized Coefficients B Std. Error 3526.422 327.725 educat exper 90.020 24.694 .290 3.645 .000 1.269 .588 .162 2.159 .034 months 23.406 5.201 .338 4.500 .000 males 722.461 117.822 .486 6.132 .000 25 (Constant) a. Dependent Variable: salary Beta Employment Discrimination (cont.)  = 3526.422 + 722.461 Males + 90.02 Educat + 1.269Exper + 23.406 Months  Yes. There is a difference in salaries, on average, for male and female workers after accounting for the effects of the EDUC, EXPER, and MONTHS variables. Males’ salaries are, on average, $722 higher, a statistically significant difference (p-value = 0).  Forecast of average salary for males with 12 years education, 10 years of experience and with MONTHS equal to 15: = 3526.422 + 722.461 + 90.020(12) + 1.269(10) + 23.406(15) = 5692.903 Forecast of average salary for females with 12 years education, 10 years of experience and with MONTHS equal to 15: = 3526.422 + 90.020(12) + 1.269(10) + 23.406(15) = 4970.422 26 Interaction Regression Models  We define the first-order linear model as follows E(Y) = 0 + 1 X1 + 2 X2 + ... + k Xk  The assumption that a first-order model will adequately characterize the relationship between E(Y) and independent variables is equivalent to assuming that independent variables do not “interact”; that is, we assume that the effect on E(Y) of a change in Xi (for a fixed value of Xj) is the same regardless of the value of Xj. Thus, “no interaction” is equivalent to saying that the effect of changes in one variable(say Xi) on E(Y) is independent of the value of the second variable (say Xj).  However, if the relationship between E(Y) and Xi does, in fact, depend on the value of Xj held fixed, then the first-order model is not appropriate for predicting Y. In this case, we need another model that will take into account this dependence - Interaction Model. 27 Interaction Model with Two Independent Variables E(Y) = 0 + 1 X1 + 2 X2 + 3 X1X2 where (1 + 3 X2) represents the change in E(Y) for every 1-unit increase in X1, holding X2 fixed. (2 + 3 X1) represents the change in E(Y) for every 1-unit increase in X2, holding X1 fixed. 0 is the intercept of the model, the value of E(Y) when X1=X2 =0 The cross-product term, 3 X1X2 , is called an interaction term. 28 Example 1 (cont.) Is there evidence that X1 and X2 interact? Test at  = 0.05. Solution H0: 3 = 0 vs Ha: 3  0 The p-value = 0.169 is greater than  = 0.05, H0 is not rejected. There is insufficient evidence to indicate X1 (experience) and X2 (performance) interact at the 5% level. 29 Comparing Nested Models  To be successful model builders, we require a statistical method that will allow us to determine (with a high degree of confidence) which one among a set of candidate models best fits the data. One of such techniques we are going to discuss is for nested models.  Two models are nested if one model contains all the terms of the second model and at least one additional term. The more complex of the two models is called the complete model (or full model) and the simpler of the two is called the reduced model. For examples, (a) Y= 0 +1X1+2X2 + 3X3+4X4+  -- Complete model --- Reduced model Y= 0 + 1X1+ 2X2 + 3X3 +  (b) The first-order and second-order models are nested. 30 Partial F-Test for Comparing Nested Models Y= 0 + 1X1 + 2X2 +… + gXg +  -- Reduced model Y = 0 + 1X1 + 2X2 + … + gXg + g+1Xg+1 + … + kXk +  H0: g+1 = g+2 = ... = k = 0 The test statistic is vs -- Complete model Ha: H0 is not true SSER = Sum of squared errors for the reduced model. SSEC = Sum of squared errors for the complete model. MSEC= Mean square error for the complete model. k - g = Number of  parameters tested (in H0). k = Number of independent variables in the complete model. For the partial F-test, df1 = (k-g) and df2 = (n-k-1). 31 Example 1 (Cont.) The complete model Y = 0 + 1X1 + 2X2 + 3X1X2 +  The reduced model Y =  0 +  1 X1 +  Where X1 - Experience and X2 - Performance. Test if the complete model fits the data better at the 5% level. Solution H0: 2 = 3 = 0 vs Ha: H0 is not true SSER = 183.096 SSEC = 152.004 MSEC = 13.819 k-g = 3 - 1 = 2 ( = df1), (n-k-1) = 15 - 3 - 1 = 11 (=df2) The test statistic F = [(SSER - SSEC)/2]/MSEC = [(183.096-152.004)/2]/13.819 = 1.125 Since 1.125 < F0.05, (2, 11)= 3.98, we do not reject H0 and conclude that there is insufficient evidence to indicate the complete model is better than the reduced model at the 5% level. 32 Example 1 (Cont.) Reduced Model Full Model 33 Example (Cont.) 34 Beta (Standardized) Coefficients  It is inappropriate to interpret the bj’s as indicators of the relative importance of independent variables since generally independent variables measure different concepts, and so have different units of measurements.  For this reason, the regression output always includes socalled Beta coefficients (standardized regression coefficients) which are the coefficients of the independent variables when all variables are expressed in standardized form. (Refer to section 7.5 on pages 273 ~ 276)  Beta coefficients can be calculated directly from the regression coefficients using the following formula: 35 Beta Coefficients (cont.) Betaj = bj (sxj /sy) = bj (SSxjxj/SSyy) 1/2, where sxj is the standard deviation of the jth independent variable, sy is the standard deviation of the dependent variable and bj is the unstandardized partial regression coefficient for the jth independent variable.  When two or more independent variables are entered, the beta coefficients can be used to directly compare the importance of each independent variable in relation to the dependent variable.  For example, if Beta1 = 0.85, Beta2 = 0.23, then independent variable x1 is more important to the dependent variable comparing with x2 . 36 Example 3 Mrs. Goh, a real estate agent, wants to develop a multiple regression model to find the relationship between the sale price of houses and various characteristics of the houses. She collected data on six variables, recorded in the table, for 13 houses that were sold recently. The six variables are: Price:Sale price of a house in thousands of dollars Lot size: Size of the lot in acres Living area: Living area in square feet Age: Age of a house in years Corner: Whether or not a house is on a corner lot Garage: Whether or not a house has a garage. 37 Example 3 (Cont.)  Discuss the following SPSS printouts for the model based on the above data. 38 Example 3 - Discussion The model is useful The model fits the data well 39 Example 3 - Discussion (Cont.)  From the beta coefficients in the above table, we can say that AREA is the most important independent variable, next one is CORNER. The less important variables are AGE and SIZE. 40 Example 3 - Discussion (Cont.) After dropping SIZE and AGE, R2 = 0.976 and Ra2 = 0.968. Significant 41 Example 3 - Discussion (Cont.) H0: s = a = 0 SSER = 1062.059 vs Partial F-test. Ha: H0 is not true SSEC = 642.24 MSEC =91.75 k-g = 5 - 3 = 2 ( = df1), (n - k - 1) = 13 - 5 - 1 = 7 ( = df2) The test statistic F = [(SSER - SSEC)/2]/MSEC = [(1062.059 - 642.24)/2]/91.75 = 2.288 Since 2.288 < F0.05, (2, 7)= 4.74, we do not reject H0 and conclude that there is insufficient evidence to indicate the complete model is better than the reduced model at the 5% level. 42 Extra Sums of Squares  In the textbook, the difference, (SSER - SSEC), is called extra sums of squares. An extra sum of squares measures the marginal reduction in the error sum of squares when one or several predictor variables are added to the regression model, given that other predictor variables are already in the model.  Equivalently, one can view an extra sum of squares as measuring the marginal increase in the regression sum of squares when one or several predictor variables are added to the regression model.  The reason for the equivalence of the marginal reduction in the error sum of squares and the marginal increase in the regression sum of squares is SST = SSR + SSE. That is, SST does not depend on the regression model fitted, any reduction in SSE implies an identical increase in SSR. 43 Coefficients of Partial Determination  Extra sums of squares are not only useful for tests on the regression coefficients of multiple regression model, but they are also encountered in descriptive measures of relationship called coefficients of partial determination.  R2 measures the proportionate reduction in the variation of Y achieved by the introduction of the entire set of X variables considered in the model.  A coefficient of partial determination, in contrast, measures the marginal contribution of one X variable when all others are already included in the model.  Let us consider the following model Yi = 0 + 1Xi1 + 2Xi2 + i 44 Coefficients of Partial Determination (cont.)  SSE(X1) measures the variation in Y when X1 is included in the model SSE(X1, X2) measures the variation in Y when both X1 and X2 are included in the model  Hence, the relative marginal reduction in the variation in Y associated with X2 when X1 is already in the model is r2Y2.1= SSR(X2|X1)/SSE(X1)=[SSE(X1)-SSE(X1, X2)]/SSE(X1)  The above is the coefficient of partial determination between Y and X2, given that X1 is in the model. Similarly, we can define the coefficient of partial determination between Y and X1, given that X2 is in the model as follows r2Y1.2= SSR(X1|X2)/SSE(X2)=[SSE(X2)-SSE(X1, X2)]/SSE(X2) 45 Coefficients of Partial Determination (cont.)  General Case r2Y1.23 = SSR(X1|X2 , X3)/SSE(X2, X3) r2Y2.13 = SSR(X2|X1, X3)/SSE(X1, X3) r2Y3.12 = SSR(X3|X1 , X2)/SSE(X1, X2) r2Y4.123 = SSR(X4|X1 , X2 , X3)/SSE(X1 , X2, X3)  Comments (on page 270) (a) The coefficients of partial determination is between 0 and 1. (b) A coefficients of partial determination can be interpreted as a coefficient of simple determination. 46 Multicollinearity  Often, two or more of the independent variables used in the multiple regression model will contribute redundant information. That is, the independent variables will be correlated with each other. When the independent variables are highly correlated, we say that multicollinearity exists. A few problems arise when serious multicollinearity is present in the regression analysis.  First, high correlation among the independent variables increase the likelihood of rounding errors in the calculations of the i estimates, standard errors, and so forth.  Second, and more important, the regression results may be confusing and misleading. 47 Example  b = (X’X)-1X’Y var(b) = 2 (X’X)-1 48 Example (cont.)  Three important effects are illustrated in the sequence of matrices (a) The sampling variances of the estimated coefficients increase sharply with increasing collinearity between the independent variables. (b) Greater covariances between the independent variables produce greater sampling covariances for the LS coefficients. (c) Small variations in the data (say, dropping or adding a few observations) may produce substantial variations in the LS coefficients. 49 Detecting Multicollinearity (1) Significant correlation between pairs of independent variables in the model (2) Nonsignificant t tests for all (or nearly all) of the individual  parameters when the F-test for overall model adequacy H0: 1 = 2 = … = k = 0 is significant (3) Signs opposite from what is expected in the estimated parameters (4) If VIFi = (1- Ri2 )-1  10 or if the mean of VIF, i.e., (VIFi )/k, considerably larger than 1, i=1,2,…k (pages 408~409). (5) A more sophisticated method is to use Principal Components Analysis.  One of the commonly used simple methods to solve the multicollinearity is to drop one or more of the highly correlated independent variables from the multiple regression model. 50 Example 1 (Cont.) Significant Nonsignificant 51 Example 1 (Cont.)  X1 (experience ) and X2 (performance) are highly correlated. 52 Example 1 (Cont.)  The mean VIF values = (3.75 + 3.75)/2 = 3.75 is considerably larger than 1 53 Example 1 (Cont.)  After dropping the independent variable x2 (performance) Significant 54 Polynomial Regression Models  Polynomial regression models for quantitative predictor variables are among the most frequently used curvilinear response models in practice because of their ease in handling as a special case of the general linear regression model. Polynomial models have two basic types of uses: (1) When the true curvilinear response function is indeed a polynomial function (2) When the true curvilinear response function is unknown (or complex) but a polynomial function is a good approximation  The following are several commonly used polynomial regression models 55 Commonly Used Polynomial Models  Second-Order model with one predictor variable Yi = 0 + 1xi + 2xi2 + i , i = 1, 2, …, n where xi = Xi -X, the regression coefficient 0 represents the mean response of Y when x = 0 (i.e. X = X), 1 is often called the linear effect coefficient, and 2 is called the quadratic effect coefficient.  The reason for using a centered predictor variable in the polynomial model is that X and X2 often will be highly correlated. Centering the predictor variable often reduces the multicollinearity substantially.  Higher-Order model with one predictor variable Yi = 0 + 1xi + 2xi2 +… + mxim + i , i = 1, 2, …, n where xi = Xi -X. 56 Polynomial Models (cont.)  Second-Order model with two predictor variables Yi = 0 +1xi1 +2xi2 +3xi12+4xi22 +5xi1xi2 +i , i = 1, 2, …, n where xi1 = Xi1 -X1, xi2 = Xi2 -X2, 5 is called the interaction effect coefficient.  Second-Order model with three predictor variables Yi = 0 +1xi1 +2xi2 +3xi3+4xi12 +5xi22 + 6xi32 +7xi1xi2 +8xi1xi3+9xi2xi3 +i , i = 1, 2, …, n where xi1 = Xi1 -X1, xi2 = Xi2 -X2, x13 = Xi3 -X3 , 7, 8 and 9 are called the interaction effect coefficients. 57 Implementation of Polynomial Models  Fitting of polynomial regression models presents no new problems since they are special cases of the general linear regression model. For example, technically, the quadratic model includes only one independent variable X, but we can think of the model as a general linear model with two independent variables X1 (= X) and X2 (= X2). Hence, all earlier results on fitting apply, as do the earlier results on making inferences. 58 Implementation of Polynomial Models (cont.)  How can you choose an appropriate linear model to fit to a set of data, the first-order, second-order or higher-order? Since most relationships in the real world are curvilinear, a good choice would be a higher-order linear model.  If you are fairly certain, based on your experience, knowledge, or prior information (past researches in this area), that the relation ships between E(Y) and independent variables are approximately first-order and that the independent variables do not interact, you could select a first-order model for the data.  Keep in mind that you may be forced to use a first-order model rather than a second-order or higher-order model simply because you do not have sufficient data to estimate all parameters in a higher-order model. 59 Example  Refer to the case example in the textbook (page 300 - 305)  The second-order polynomial model was used first. Yi = 0 +1xi1 +2xi2 +3xi12+4xi22 +5xi1xi2 +i , i =1, 2, …, 11 where xi1=Xi1-1 and xi2 = Xi2 - 20. 60 Example (cont.) 61 Example (cont.) 62 Example (cont.)  Partial F-test H0: 3 = 4 = 5= 0 vs Ha: H0 is not true. SSER = 7700.33 SSEC = 5240.44 MSEC = 1048.1 df1 = 3 and df2 = (n-k-1) = 11 - 5 - 1 = 5 F = [(SSER - SSEC)/2]/MSEC = [(7700.33 - 5240.44)/3]/1048.1 = 0.782 Since 0.782 < F0.05, (3, 5) = 5.41, we do not reject H0 and conclude that there is insufficient evidence to indicate the second-order model is better than the first-order model at the 5% level.  Comparing the correlation between X1 and X12 with the correlation between x1 and x12. Similarly for x2 and X2.  Discuss the P-P plot and Residuals plot. 63

Use Quizgecko on...
Browser
Browser