Multiple Regression Analysis (II) Notes PDF

Summary

These notes provide an overview of multiple regression analysis, focusing on model-building techniques. They detail procedures like all-possible regressions, forward/backward stepwise regression, and model validation. The document also discusses various criteria for evaluating different regression models.

Full Transcript

Multiple Regression Analysis (II)  Overview of Model-Building Process  All-Possible-Regressions Procedure  Forward Stepwise Regression and Other Automatic Procedures  Model Validation 1 Introduction  The user of multiple linear regression attempts to accomplish one of three objectives: (1) O...

Multiple Regression Analysis (II)  Overview of Model-Building Process  All-Possible-Regressions Procedure  Forward Stepwise Regression and Other Automatic Procedures  Model Validation 1 Introduction  The user of multiple linear regression attempts to accomplish one of three objectives: (1) Obtain estimates of individual coefficients in a complete model (2) Screen variables to determine which have a significant effect on the response (3) Arrive at the most effective prediction equation.  In this lecture, some standard sequential procedures for selecting independent variables are discussed.  Overview of Model-Building Process (Section 9.1, on pages 343~350) 2 All-Possible-Regressions Procedure  The purpose of the all-possible-regressions procedure is to identify a small group of regression models that are “good” according to a specified criterion so that a detailed examination can be made of these models, leading to the selection of the final regression model to be employed.  Different criteria for comparing the regression models may be used with the all-possible-regression selection procedure. We now discuss six commonly used criteria: Rp2, Ra, 2p, Cp, AICp , SBCp and PRESSp.  Note that we denote the number of potential X variables in the pool by (P-1), and the number of X variables in a subset by (p-1), where 1< p  P. 3 Rp2 or SSEp Criterion  The Rp2 criterion calls for the use of the coefficient of multiple determination R2 in order to identify several “good” subsets of X variables. In other words, subsets for which Rp2 is high.  The Rp2 criterion is equivalent to using the error sum of squares SSEp as the criterion. With the SSEp criterion, subsets for which SSEp is small are considered “good”. This can be seen from the following equation Rp2 = SSRp/SS(T) = 1 - SSEp/SS(T) where SS(T) is constant for all possible regression models, Rp2 varies inversely with SSEp.  SSEp can never increase as additional X variables are included in the model. The intent in using the Rp2 is to find the point where adding more X variables is not worthwhile because it leads to a very small increase in Rp2. 4 Ra2 or MSEp Criterion  Since Rp2 does not take account of the number of parameters (independent variables) in the regression model and since max(Rp2) can never decrease as p increases, the adjusted coefficient of multiple determination Ra2 has been suggested as an alternative criterion: Ra2 = 1-[(SSE/(n-p))/SS(T)/(n-1)] = 1-[MSE/SS(T)/(n-1)]  It can be seen from above equation that Ra2 increases if and only if MSE decreases since SS(T)/(n-1) is fixed for the given Y observations. Hence, Ra2 and MSE provide equivalent information  Users of the MSEp (Rap2 = 1 - [MSEp/SS(T)/(n-1)]) criterion seek to find a few subsets for which MSEp is at the minimum or so close to minimum that adding more variables is not worthwhile. 5 Mallows’ Cp Criterion  This criterion is concerned with the total mean squared error of the n fitted values for each subset regression model. The mean squared error concept involves the total error in each fitted value ( i - i ), where i is the true mean response when the levels of the predictor variables Xk are those for the ith case.  ( i - i )2 = {(E[ i]- i ) + ( i - E[ i])}2 , squared error E( i - i )2 = {E[ i]- i}2 + var( i), mean squared error  That is, the mean squared error for the fitted value i is the sum of the squared bias and the variance of i . The total mean squared error for all n fitted value is  E( i - i )2 = (E{ i}- i )2 +  var( i). 6 Mallows’ Cp Criterion (cont.)  The criterion measure, denoted by p, is simply the total mean squared error divided by 2, the true error variance: p= [(E{ i}-  i )2 +  var( i)]/2 = [E{SSEp}/2] - (n-2p) (Refer to (9.11) and (9.12) on page 359)  The model which includes all (P-1) potential X variables is assumed to have been carefully chosen so that MSE(X1, …, XP-1) = SSE(X1, …, XP-1)/(n-P) = s2 is an unbiased estimator of 2. Then it can be shown that an estimator of p is Cp Cp = [SSEp/MSE(X1, …, XP-1)] - (n-2p)= [SSEp/s2] - (n-2p) where SSEp is the error sum of squares for the fitted subset regression model with (p-1) independent variables. 7 Mallows’ Cp Criterion (cont.)  When there is no bias in the regression model with (p-1) independent variables so that E( i) =  i , then E(Cp)p.  In using the Cp criterion, we seek to identify subsets of X variables for which (1) the Cp value is small and (2) the Cp value is near p. 8 AICp and SBCp Criteria  Ra,p2 and Cp are model selection criteria that penalize models having large numbers of predictors. Two popular alternatives that also provide penalties for adding predictors are Akaike’s criterion (AICp) and Schwarz’ Bayesian criterion (SBCp).  We search for models that have small values of AICp or SBCp, where these criteria are given by: AICp = nln(SSEp) – nln(n) + 2p SBCp= nln(SSEp) – nln(n) + [ln(n)]p  Models with small SSEp will do well by these criteria, as long as the penalties “2p” for AICp and “[ln(n)]p” for SBCp are not too large. If n 8 the penalty for SBCp is large than that for AICp; henec the SBCp criterion tends to favor more parsimonious models. 9 PRESSp Criterion  The PRESSp (prediction sum of squares) criterion is a measure of how well the use of the fitted values for a subset model can predict the observed responses Yi.  The PRESS measure differs from SSE in that each fitted value i for the PRESS criterion is obtained by deleting the ith case from the data set, estimating the regression function for the subset model from the remaining (n-1) cases, and then using the fitted regression function to obtain the predicted value i(i) for the ith case.  The PRESSp criterion is the sum of the squared prediction errors over all n cases, that is, PRESSp =  (Yi - i(i) )2.  Models with small PRESSp values fit well in the sense of having small prediction errors. 10 Some Comments  Discuss Tables 9.2 and 9.3 in the textbook “Surgical Unit Example” on pages 353 and 363.  The all-possible-regressions procedure leads to the identification of a small number of subsets that are good. According to a specified criterion. The different criterion may lead to substantially different subset identifications. Consequently, one may wish at times to consider more than one criterion in evaluating possible subsets of X variables.  Once the investigator has identified a few “good” subsets for intensive examination, a final choice of the model variables must be made. This choice is aided by residual analyses, examination of influential observations, and other diagnostics for each of the competing models, and by the investigator’s knowledge of the subject under study, and is finally confirmed by model validation. 11 Forward, Backward and Stepwise Methods  In those occasional cases when the pool of potential X variables contains, say, 50 use of the all-possible-regression procedure may not be feasible. An automatic search procedure that develops the “best” subset of X variables sequentially may then be helpful.  Forward, Backward and Stepwise Selection are probably the most widely used of the automatic search procedures for selecting independent variables. These procedures are based on the notion that a single variable or a collection of variables should not appear in the estimating equation unless there is a significant increase in the regression sum of squares or, equivalently, significant increase in R2 , the coefficient of multiple determination. 12 Forward Method Forward Selection  Independent variables should be inserted one at a time until a satisfactory regression equation is found.  The procedure starts with the independent variable that has the highest correlation with the response variable y.  Once the independent variable is in the model, it stays.  At the second step, the remaining k - 1 variables are examined, and the variable for which the partial F-statistic is a maximum is added to the equation.  The procedure goes on until there are no remaining independent variables that significantly increase R2. 13 Backward and Stepwise Methods Backward Selection  Start with all the independent variables in the model and then deletes variables one at a time using a partial F-test until all remaining variables produce a significant F statistic. Stepwise Selection:  Similar to the forward selection method, except that the variables entered do not necessarily stay in the model in subsequent steps.  After a variable is entered, the stepwise method looks at all the variables already in the model and deletes any variable that does not produce a significant partial F-statistic value.  Therefore, the stepwise method is a combination of forward and backward procedures and is the most widely used variable selection technique. 14 Example Consider the data in the following table in which measurements were taken on 9 infants. The purpose of the experiment was to arrive at a suitable estimating equation relating the length of an infant to all or a subset of the independent variables. (1) Use all independent variables to find the estimated multiple regression equation. (2) Use the forward, backward and stepwise methods to choose a suitable estimating equation respectively for the data. (3) Comment on your results. 15 Example (cont.) Infant length y (cm) 57.5 52.8 61.3 67.0 53.5 62.7 56.2 68.5 69.2 Age x1 (days) Length at birth x2 (cm) Weight at birth x3 (k g) Chest size at birth x4 (cm) 78 69 77 88 67 80 74 94 102 48.2 45.5 46.3 49.0 43.0 48.0 48.0 53.0 58.0 2.75 2.15 4.41 5.52 3.21 4.32 2.31 4.30 3.71 29.5 26.3 32.2 36.5 27.2 27.7 28.3 30.3 28.7 16 Example (cont.) - Full Model 17 Example - Stepwise Model 18 Example - Backward Model 19 Example - Forward Model 20 Example (cont.)  Now we have four estimated models. Which one should we select as our final model? Before we answer the question, see the following summary table. Full Model Forward Backward Stepwise R2 Ra2 0.991 0.988 0.991 0.988 0.982 0.984 0.987 0.984 No of Xi 4 2 2 2  From the above table we can see that the backward model is the best among these four models since it has the largest Ra2 and the less number of the independent variables. 21 Some Final Comments on Model Selection  Judgment needs to play an important role in model building for exploratory studies.  Some explanatory variables may be known to be more fundamental than others and therefore should be retained in the regression model if the primary purpose is to develop a good explanatory model.  When a qualitative predictor variable is represented in the pool of potential X variables by a number of indicator variables, it is often appropriate to keep these indicator variables together as a group to represent the qualitative variable, even if a subset containing only some of the indicator variables is “better” according to the criterion employed.  If second-order terms Xk2 or interaction terms XiXj need to be present in a regression model, one would ordinarily wish to have the first-order terms in the model as representing the main effect. 22 Model Validation  The final step in the model-building process is the validation of the selected regression models. Model validation usually involves checking a candidate model against independent data. The following are the three basic ways of validating a regression model:  Collection of new data to check the model and its predictive ability.  Comparison of results with theoretical expectations, early empirical results, or simulation results.  Use of a holdout sample to check the model and its predictive ability (Data Splitting, on page 372).  Discuss the example on pages 373~375. 23

Use Quizgecko on...
Browser
Browser