Nonlinear Regression Models Notes PDF

Summary

These notes cover nonlinear regression models, including linear and nonlinear regression, logistic regression, and exponential models. The notes also discuss different modeling techniques and theoretical considerations, such as large sample theory, model building, goodness of fit tests, and neural network modeling. The document is likely a part of a course on statistical modeling, probably targeting an undergraduate level.

Full Transcript

Nonlinear Regression Models  Linear and Nonlinear Regression Models  Logistic Regression 1 Linear and Nonlinear Regression Models Linear Regression Models Yi = f(Xi, ) + i = 0 + 1 Xi1 + 2 Xi2 +... + k Xik + i , i = 1, 2, …, n Or in matrix terms Yn1 = Xn(k+1) (k+1)1 + n1 (Y = X +) wh...

Nonlinear Regression Models  Linear and Nonlinear Regression Models  Logistic Regression 1 Linear and Nonlinear Regression Models Linear Regression Models Yi = f(Xi, ) + i = 0 + 1 Xi1 + 2 Xi2 +... + k Xik + i , i = 1, 2, …, n Or in matrix terms Yn1 = Xn(k+1) (k+1)1 + n1 (Y = X +) where  is a vector of independent normal variables with E( ) = 0 and Var() = 2 I. Nonlinear Regression Models Yi = f(Xi, )+i or in matrix terms Y= f(X, )+ where E(Y) = f(X, ) is nonlinear in parameters, E( ) = 0 and Var() = 2 I. There are two widely used nonlinear regression models in practice: Exponential and Logistic Models. 2 Two Nonlinear Regression Models Exponential Regression Model Yi = 0 + 1 exp(2Xi)+i , where the error terms i are independent ~ N(0, 2). Note that the response function = 0 + 1 exp(2Xi) is not linear in the parameters 0, 1 and 2. The above model is commonly used in growth studies where the rate of growth at a given time X is proportional to the amount of growth remaining as time increases, with 0 representing the maximum growth value. 3 Two Nonlinear Regression Models (cont.) Logistic Regression Model Yi = [0 /(1+ 1 exp(2Xi))]+i , where the error terms i are independent ~ N(0, 2). Note again that the response function = [0 /(1+ 1exp(2Xi))] is not linear in the parameters 0, 1 and 2. Logistic regression model is widely used when the response variable is qualitative. An example of this use of the logistic regression model is for predicting whether a household will purchase a new car this year (Yes, No; or will, will not) based on the predictor variables age of presently owned car, household income, and size of household. 4 Estimation of Regression Parameters Estimation of the parameters of a nonlinear regression model is usually carried out by the method of least squares or the method of maximum likelihood, just as for linear regression models. Also like linear regression, both of theses methods of estimation yield the same parameter estimates when the error terms in nonlinear regression model are independent normal with constant variance. Unlike in linear regression, it is usually not possible to find analytical expressions for the least squares and maximum likelihood estimators for nonlinear regression models. Instead, iterative numerical search procedures usually have to be used with both of these estimation procedures, requiring intensive computations. The analysis of nonlinear regression models is therefore usually carried out by utilizing standard computer software programs. 5 Model Building and Diagnostics The model-building process for nonlinear regression models often differs somewhat from that for linear models. The reason is that the functional form of many nonlinear models is less suitable for adding or deleting predictor variables and curvature and interaction effects in the direct fashion that is feasible for linear models. Whether a model is linear or nonlinear, the appropriateness of the model must always be considered. For examples, correlated error terms, unequal error variances. Again, the analysis of residuals will be helpful in diagnosing departures from the assumed model, just as for linear regression models. Form lack of fit test could be performed if the sample size is reasonable large. 6 Large Sample Theory The large sample theory (LST) tells us that the least squares and maximum likelihood estimators for nonlinear regression models (regardless if the error terms are normal), when the sample size is large, are approximately normally distributed and almost unbiased, and have almost minimum variance. As a result of the LST, inferences (CI and test) for nonlinear regression parameters are carried out in the same fashion as for linear regression when the sample size is reasonably large. Of course, these inference procedures are only approximate, but the approximation often is very good. When is the large-sample theory applicable? That is, what is the sample size that could be considered as reasonably large? See the textbook on pages 528 -529. 7 Neural Network Modeling In recent years there has been an explosion in the amount of available data, made possible in part by the widespread availability of low-cost computer memory and automated data collection systems. This exponential growth in available data has motivated researchers in the fields of statistics, artificial intelligence, and data mining to develop simple, flexible, powerful procedures for data modeling that can be applied to very large data sets. Neural network modeling is one such technique. The basic idea behind the neural network approach is to model the response as a nonlinear function of various linear combinations of the predictors. Thus, the neural network model is simply a nonlinear statistical model that contains many more parameters than the corresponding linear statistical model 8 (Pages 537~547). Neural Network Modeling (cont.) The neural networks have found widespread application in many fields and become one of the standard tools in the field of data mining, and their use continues to grow. Advantages to the neural network modeling approach: Model is extremely flexible and can be used to represent a wide range of response surface shapes. Standard regression assumptions are not required Disadvantages to the neural network modeling approach: Over-parameterized Model parameters are generally uninterpretable Diagnostics and significance testing are currently not generally available 9 Logistic Regression Model In many studies the outcome variable of interest is the presence or absence of some condition. We cannot use ordinary multiple regression for such data, but instead we can use a similar approach known as multiple linear logistic regression or just logistic regression. The basic principle of logistic regression is much the same as for ordinary multiple regression. The main difference is that instead of developing a model that uses a combination of the values of a group of independent (explanatory) variables to predict the value of a dependent variable, we instead predict a transformation of the dependent variable. 10 Logistic Regression Model (cont.) Before explaining the method it is useful to recall that if we have a binary variable and give the categories numerical values of 0 and 1, usually representing ‘No’ and ‘Yes’ respectively, then the mean of these values in a sample of individuals is the same as the proportion of individuals with the characteristic. We might expect, therefore, that the appropriate regression model would predict the proportion of subjects with the feature of interest (or, equivalently, the probability of an individual having that characteristic) for any combination of the explanatory variables in the model. In practice a statistically preferable method is to use a transformation of this proportion, as described below. One reason is that otherwise we might predict impossible probabilities outside the range 0 to 1. 11 Meaning of Response Function Consider the simple linear regression model Yi =  0 +  1 X i +  i , i = 1, 2, …, n where the outcome Yi is binary taking on the value of either 0 or 1. The expected response E(Yi) has a special meaning in this case. Since E(i ) = 0, we have E(Yi) = 0 + 1 Xi. Consider Yi to be a Bernoulli random variable for which we can state the probability distribution as follows: P(Yi = 1) = i and P(Yi = 0) = 1 - i Thus, i is the probability that Yi = 1 and 1 - i is the probability that Yi = 0. By the definition of expected value of a random variable, then E(Yi) = 1(i) + 0(1 - i) = i = 0 + 1 Xi. The mean response E(Yi) as given by the response function is therefore simply the probability that Yi =1 when the level of predictor variable is Xi. This interpretation applies whether the response function is a simple linear one or multiple regression one. 12 Special Problems When Y is Binary Nonnormal Error Terms For a binary 0, 1 response variable, each error term can take on only two values i = 1 - 0 - 1 Xi when Yi = 1 and i = - 0 - 1 Xi when Yi = 0. Clearly, i are not normally distributed. Nonconstant Error Variance Var(Yi) = E[(Yi - E(Yi))2] = (1- i)2 i + (0 - i)2(1- i) = (1- i) i Var(i) = Var(Yi)=(1- i) i = (1- 0 - 1 Xi ) (0 + 1 Xi) Constraints on Response Function Since the response function represents probabilities when the outcome variable is a 0, 1 indicator variable, 0 E(Y)=  1. The difficulties created by the need for the restriction on the response function are the most serious. 13 Simple Logistic Response Function Both theoretical and empirical considerations suggest that when the response variable is binary, the response function will frequently be shaped either as a tilted S or as a reverse tilted S, and they are approximately linear except at the ends. See the Figure 14.2 on Page 558. The response functions plotted in the Figure 14.2 (c) & (d) are called Logistic Response Functions and are of the form E(Y) = [exp(0 + 1 X)]/[1 + exp(0 + 1 X)] = [1 + exp(-0 - 1 X)]-1 = 14 Properties of Logistic Response Function A logistic response function is either monotonic increasing or monotonic decreasing, depending on the sign of 1. A logistic response function can be linearized easily. If we make the transformation loge [ /(1-)] = *, we obtain * = 0 + 1 X This transformation is called the logit transformation of the probability . The ratio /(1-) in the logit transaformation is called the odds. ’ is called the logit mean response. Logistic response functions, like the other response functions that discussed, are used for describing the nature of the relationship between the mean response and one or more predictor variable(s) (descriptive purpose). They are also used for making predictions (prediction purpose). 15 Simple Logistic Regression Simple Logistic Regression Model Yi = E(Yi) + i Yi are independent Bernoulli random variables with expected values E(Yi) = i , where E(Yi) = i = [exp(0 + 1 Xi)]/[1 + exp(0 + 1 Xi)]. The X observations are assumed to be known constants. Alternatively, if the X observations are random, E(Yi) is viewed as a conditional mean, given the value of Xi. We use the ML method to estimate the parameters 0 and 1 in the logistic regression model. Once the mle b0 and b1 are found, we substitute these values into the response function to obtain the fitted response function. *=b0+b1X=loge[/(1-)], where =[exp(b0+b1X)]/[1+exp(b0+b1X)] 16 Example 1 A system analyst studied the effect of computer programming experience on ability to complete within a specified time a complex programming task, including debugging. Twentyfive people were selected for the study. They have varying amounts of programming experience (in month). All persons were given the same programming task, and the results are coded in binary fashion: Y = 1 if the task was completed successfully in the allotted time, and Y = 0 otherwise. (Refer to the page 565) 17 Interpretation of b1 Consider the values of the fitted logistic response function at X = Xj and X = Xj + 1, that is, b0 + b1 Xj and b0 + b1(Xj+1). The difference between the two fitted values is simply b1. Since b0 + b1Xj and b0 + b1(Xj+1) are the logarithm of the estimated odds when X = Xi and Xi+1 respectively, then b1 = loge(odds2) - loge(odds1) = loge(odds2/odds1). Taking antilogs of each side, we see that the estimated ratio of the odds, called the odds ratio (OR) equals exp(b1), that is, exp(b1) = (odds2/odds1). Example 1 (cont.) The odds ratio is exp(b1) = 1.175, so that the odds increase by 17.5 percent with each additional month of experience. 18 Confidence Interval for OR Clearly for any binary variable the OR can be estimated from the regression coefficient bi as OR = exp(bi). We can use the standard error of bi to get a confidence interval for i and thus for exp(i). Example 1 (cont.) The standard error of the regression coefficient for experience is 0.065 and a confidence interval is obtained by taking b1 to have an approximately normal sampling distribution. A 95% confidence interval for 1 is thus given by [(0.161 - 1.96(0.065)), (0.161+1.96(0.065))] that is, from 0.034 to 0.288. The 95% confidence interval for the OR is thus e0.034 to e0.288, that is, from 1.035 to 1.334. 19 Tests for Goodness of Fit The appropriateness of the fitted logistic model needs to be examined before it is accepted for use, as the case for all regression models. Goodness of fit tests provide an overall measure of the fit of the model, and are usually not sensitive when the fit is poor for just a few cases. (a) Pearson Chi-Square goodness of fit test (page 586) (b) Deviance Goodness of fit test (page 588). (c) Hosmer-Lemeshow goodness of fit test (page 589). H0: E(Yi)= [exp(0+1Xi)]/[1+exp(0+1Xi)] vs Ha: H0 is not true. Hosmer and Lemeshow Test Example 1 (cont.) Step 1 Chi-square 5.145 df Sig. 6.525 20 Multiple Logistic Regression Model In extending the simple logistic regression model, we simply replace (0 + 1 X) by (0 + 1 X1 + 2 X2 + … + k Xk). If we use matrix notation, the multiple logistic response function can be written as E(Y) = [exp(XT)]/[1 + exp(XT)] and the logit response function * =XT. Like the simple logistic response function, the multiple logistic response function is monotonic and sigmoidal in shape with respect to XT and is almost linear when  is between 0.2 and 0.8. The X variables may be different predictor variables, or some may represent curvature and/or interaction effects. Also, the predictor variables may be quantitative or qualitative. This flexibility make the multiple logistic regression model very attractive. When the logistic regression model contains only qualitative variables, it is often referred to as a log-linear model. 21 Example 2 Example on Page 573. Hosmer and Lemeshow Test Step Chi-square df Sig. 9.188 8.327 1 22 Polytomous Logistic Regression Logistic regression is most frequently used to model the relationship between a dichotomous response variable and a set of predictor variables. On occasion, however, the response variable may have more than two levels. Logistic regression can still be employed, by means of a polytomous logistic regression model. Polytomous logistic regression is an extension of the logistic regression analysis where the response variable has two possible outcomes. An approximate way of carrying out polytomous logistic regression analysis is to fit several individual binary logistic regression models. 23

Use Quizgecko on...
Browser
Browser