Simple Linear Regression Analysis Notes PDF
Document Details
Uploaded by MemorableHonor
Tags
Summary
These notes provide an introduction to simple linear regression analysis, a statistical method for examining the relationship between two variables. The material covers basic concepts, models, and estimation techniques. The content also discusses different situations involving dependent and independent variables.
Full Transcript
Simple Linear Regression Analysis Introduction to Regression Analysis Simple Linear Regression Model Inferences in Regression Analysis Diagnostics and Remedial Measures Matrix Approach to Linear Regression Analysis 1 Introduction to Regression Analysis The regression analysis is one o...
Simple Linear Regression Analysis Introduction to Regression Analysis Simple Linear Regression Model Inferences in Regression Analysis Diagnostics and Remedial Measures Matrix Approach to Linear Regression Analysis 1 Introduction to Regression Analysis The regression analysis is one of the most important and widely used statistical techniques in business and economic analysis for examining the functional relationships between two or more variables. One variable is specified to be the dependent/response variable (DV), denoted by Y, and the other one or more variables are called the independent/predictor/explanatory variables (IV), denoted by Xi, i=1,2, … k. There are two different situations: (a) Y is a random variable and Xi are fixed, no-random variable, e.g. to predict the sales for a company, the Year is the fixed Xi variable. (b) Both Xi and Y are random variables, e.g. all survey data are of this type, in this situation, cases are selected randomly from the population, and both Xi and Y are measured. 2 Main Purposes Regression analysis can be used for either of two main purposes: (1)Descriptive: The kind of relationship and its strength are examined. This examination can be done graphically or by the use of descriptive equations. Tests of hypotheses and confidence intervals can serve to draw inferences regarding the relationship. (2)Predictive: The equation relating Y and Xi can be used to predict the value of Y for a given value of Xi . Prediction intervals can also be used to indicate a likely range of the predicted value of Y. 3 Description of Methods of Regression: The general form of a probabilistic model is Y = Deterministic component + random error As you will see, the random error plays an important role in testing hypotheses and finding confidence intervals for the parameters in the model. The simple regression analysis means that the value of the dependent variable Y is estimated on the basis of only one independent variable. Y = f(X) + . On the other hand, multiple regression is concerned with estimating the value of the dependent variable Y on the basis of two or more independent variables. Y = f(X1 , X2 ... Xk) + , where k 2 . 4 Simple Linear Regression Model We begin with the simplest of probabilistic models - the simple linear regression model. That is, f(X) is a simple linear function of X, f(X) = 0 + 1 X . The model can be stated as follows: Yi = 0 + 1 Xi + i , i = 1, 2, …, n where Yi is the value of the response variable in the ith trial Xi is a non-random variable, the value of the predictor variable in the ith trial. i is a random error with E(i) = 0, var(i) =2, and cov(i, j) = 0, ij. 0 and 1 are parameters. 5 Important Features of the Model (1) The response Yi in the ith trial is the sum of two components: (1) the constant term 0 + 1 Xi and (2) the random term i . Hence, Yi is a random variable. (2) E(Yi) = 0 + 1 Xi (3) Yi in the ith trial exceeds or falls short of the value of the regression function by the error term amount i . (4) var(Yi) = var(i) = 2. Thus, the regression model assumes that the probability distributions of Y have the same variance 2, regardless of the level of the predictor variable X. (5) Since the error terms i and j are uncorrelated, so are Yi and Yj. (6) In summary, the regression model implies that the responses Yi come from probability distributions whose means are E(Yi) = 0 + 1 Xi and var(Yi)=2, the same for all levels of X. Further, any two Yi and Yj are uncorrelated. 6 Estimating the Model Parameters E() = 0 is equivalent to that E(Y) equals the deterministic component of the model. That is, E(Y) = 0 + 1X , where the constants 0 and 1 are the population parameters. It is called the population regression equation (line). Denoting estimates of 0 & 1 by 0 = b0 and 1 = b1 respectively, we can then estimate E(Y) by from the sample regression equation (or the fitted regression line) = b0 + b1 X . The problem of fitting a line to a sample of points is essentially the problem of efficiently estimating the parameters 0 and 1 by b0 and b1 respectively. The best known method for doing this is called the least squares method (LSM). 7 The Least Squares Method The principle of least squares is illustrated in the following Figure. Y Estimated (Y) e4 e2 e1 e3 Actual (Y) Y b0 + b1 x X 8 The Least Squares Method (Cont.) For every observed Yi in a sample of points, there is a corresponding predicted value i, equal to b0 + b1 xi. The sample deviation of the observed value Yi from the predicted i is ei = Yi - i , called a residual, that is, ei = Yi - b0 - b1Xi . We shall find b0 and b1 so that the sum of the squares of the errors (residuals) SSE =ei2 =(Yi - i )2=(Yi - b0 - b1 Xi)2 is a minimum. This minimization procedure for estimating the parameters is called the method of least squares. 9 The Least Squares Method (Cont.) Differentiating SSE with respect to b0 and b1, we have (SSE)/ b0 = -2(yi - b0 - b1 Xi) (SSE)/ b1 = -2(yi - b0 - b1 Xi)Xi . Setting the partial derivatives equal to zero and rearranging the terms, we obtain the equations (called the normal equations) n b0 + b1 Xi = Yi and b0 Xi + b1 Xi2 = Yi Xi which may be solved simultaneously to yield computing formulas for b0 and b1 as follows: b1 = SSxy /SSxx (=r×sy/sx), b0 = - b1 where SSxy = (Xi - )(Yi - ) = Xi Yi - ( Xi Yi )/n SSxx= (Xi - )2 = Xi2 - ( Xi)2/n 10 Properties of Least Squares Estimators (1)Gauss-Markov Theorem Under the conditions of the regression model, the least squares estimators b0 and b1 are unbiased estimators (i.e., E(b0) = 0 and E(b1) = 1) and have minimum variance among all unbiased linear estimators. (2) The estimated value of Y (i.e. = b0 + b1X) is an unbiased estimator of E(Y) = 0 + 1 X, with minimum variance in the class of unbiased linear estimators. Note that the common variance 2 can not be estimated by LSM. We can prove that the following statistic is an unbiased point estimator of 2 (You should try to prove it) s2 = SSE/(n-2) = (SSyy- b1SSxy)/(n-2) 11 Properties of Fitted Regression Line (1) The sum of the residuals is zero: ei = 0 (2) The sum of the squared residuals, ei2 , is a minimum. (3) i = Yi (4) Xiei = 0 (5) iei = 0 (6) The regression line always goes through the point ( , ). All properties can be proved directly by using the norm equations, (Yi - b0 - b1 Xi) = 0 and (Yi - b0 - b1Xi)Xi = 0, or n b0 + b1 Xi = Yi and b0 Xi +b1 Xi2 = Yi Xi . 12 Example 1 A random sample of 42 firms was chosen from the S&P500 firms listed in the Spring 2003 Special Issue of Business Week (The Business Week Fifty Best Performers). The dividend yield (DIVYIELD) and the 2002 earnings per share (EPS) were recorded for the 42 firms. These data are in a file named DIV3. Using dividend yield as the DV and EPS as the IV, plot the scatter diagram and run a regression using SPSS. (a)Find the estimated regression line . (b)Find the predicted values of DV given EPS =1 and EPS=2. 13 Example 1 – Solution Coefficientsa Standardized Unstandardized Coefficients Coefficients Model 1 B (Constant) EPS Std. Error 2.034 .541 .374 .239 Beta t .240 Sig. 3.762 .001 1.562 .126 a. Dependent Variable: Divyield = 2.034 + 0.374 x 14 Example 1 - Scatter Diagram = 2.034 + 0.374x 15 Normal Error Regression Model No matter what may be the form of the distribution of the error terms i (and hence of the Yi), the LSM provides unbiased point estimators of 0 and 1 that have minimum variance among all unbiased linear estimators. To set up interval estimates and make tests, however, we need to make an assumption about the form of the distribution of the i . The standard assumption is that the error terms i are normally distributed, and we will adopt it here. Since now the functional form of the probability distribution of the error terms is specified, we can use the maximum likelihood method to obtain estimators of the parameters 0, 1 and 2. In fact, MLE and LSE for 0 and 1 are the same. The MLE for 2 is biased = ei2/n= SSE/n = s2 (n-2)/n. A normal error term greatly simplifies the theory of regression analysis (See the comments on page 32). 16 Normality & Constant Variance Assumptions f(e) Y X2 X1 X E(Y) = 0 + 1 X 17 Inferences Concerning the Regression Coefficients Aside from merely estimating the linear relationship between X and Y for purposes of prediction, we may also be interested in drawing certain inferences about the population parameters, say 0 and 1 . To make inferences or test hypotheses concerning these parameters, we must know the sampling distributions of b0 and b1. (Note that b0 and b1 are statistics, i.e., functions of the random sample, therefore, they are random variables) 18 Inferences Concerning 1 (a) b1 is an normal random variable for the normal error model. (b) E(b1) = 1 . That is, b1 is an unbiased estimator of 1. (c) Var(b1) = 2/SSxx, which is estimated by s2 (b1) = s2/SSxx , where s2 is the unbiased estimator of 2. (d) The (1 - ) 100% Confidence interval for 1 (2 unknown) b1 - t/2 s(b1) < 1 < b1 + t/2 s(b1) where t/2 is a value of the t - distribution with (n - 2) degrees of freedom, and s(b1) is the standard error of b1 , i.e. s(b1) = s /(SSxx)1/2 . (e) Hypothesis test of 1 To test the null hypothesis H0: 1 = 0 against a suitable alternative, we can use the t distribution with n-2 degrees of freedom to establish a critical region and then base our decision on the value of t = b1 /s(b1) . 19 Inferences Concerning 0 (a) b0 is an normal random variable for the normal error model. (b) E(b0) = 0 . That is, b0 is an unbiased estimator of 0. (c) Var(b0) =2 Xi2/nSSxx, which is estimated by s2 (b0) = s2Xi2/nSSxx , where s2 is the unbiased estimator of 2. (d) The (1 - ) 100% Confidence interval for 0 (2 unknown) b0 - t/2 s(b0) < 0 < b0 + t/2 s(b0) where t/2 is a value of the t - distribution with (n - 2) degrees of freedom, and s(b0) = s(Xi2/nSSxx )1/2 . (e) Hypothesis test of 0 To test the null hypothesis H0: 0 = 0 against a suitable alternative, we can use the t distribution with n-2 degrees of freedom to establish a critical region and then base our decision on the value of t = b0 /s(b0) . 20 Some Considerations Effects of Departures From Normality If the probability distributions of Y are not exactly normal but do not depart seriously, the sampling distributions of b0 and b1 will be approximately normal. Even if the distributions of Y are far from normal, the estimators b0 and b1 generally have the property of asymptotic normality as the sample size increases. Thus, with sufficiently large samples, the confidence interval and decision rules given earlier still apply even if the probability distributions of Y depart far from normality. 21 Inferences Concerning E(Y) (1) The sampling distribution of i is normal for the normal error model. (2) i is an unbiased estimator of E(Yi). Because E(Yi) = 0 + 1Xi and E( i) = E(b0 + b1 Xi) = 0 + 1Xi = E(Yi). (3) The variance of i : var( i) = 2 [(1/n) + (Xi - & the estimated variance of i : s2( i) )2/SSxx] = s2 [(1/n) + (Xi - )2/SSxx] (4) The (1 - ) 100% confidence interval for the mean response E(Yi ) is as follows i - t/2, (n-2) s( i) < E(Yi) < i + t/2, (n-2) s( i) Note that the confidence limits for E(Yi) are not sensitive to moderate departures from the assumption that the error terms are normally distributed. Indeed, the limits are not sensitive to substantial departures from normality if the sample size is large. 22 Prediction of New Observation The distinction between estimation of the mean response E(Yi), discussed in the preceding section, and prediction of a new response Yi(new), discussed now, is basic. In the former case, we estimate the mean of the distribution of Y. In the present case, we predict an individual outcome draw from the distribution of Y. Prediction Interval for Yi(new) When the regression parameters are unknown, they must be estimated. The mean of the distribution of Y is estimated by , as usual, and the variance of the distribution of Y is estimated by MSE (i.e. s2). From the Figure in next page, we can see that there are two probability distributions of Y, corresponding to the upper and lower limits of a confidence interval for E(Y). 23 Prediction Interval Prediction limits if E(Yi) here Prediction limits if E(Yi) here Confidence limits for E(Yi) 24 Prediction Interval (cont.) Since we cannot be certain of the location of the distribution of Y, prediction limits for Yi(new) clearly must take account of two elements: (a) variation in possible location of the distribution of Y; and (b) variation within the probability distribution of Y. That is, var(predi)=var(Yi(new)- i)= var(Yi(new))+var( i)= 2+var( i). An unbiased estimator of var(pred) is as follows s2(predi)= s 2 + s2( i) = s2[1+ (1/n) + (Xi - )2/SSxx] The (1 - ) 100% prediction interval for Yi(new) is as follows i - t/2, (n-2) s(predi) < Yi(new) < i + t/2, (n-2) s(predi) 25 Comments on Prediction Interval The prediction limits, unlike the confidence limits for a mean response E(Yi), are sensitive to departures from normality of the error terms distribution. Prediction intervals resemble confidence intervals. However, they differ conceptually. A confidence interval represents an inference on a parameter and is an interval that is intended to cover the value of the parameter. A prediction interval, on other hand, is a statement about the value to be taken by a random variable, the new observation Yi(new). 26 Hyperbolic Interval Bands Y _ X Xgiven X 27 Example 2 The vice-president of marketing for a large firm is concerned about the effect of advertising on sales of the firm’s major product. To investigate the relationship between advertising and sales, data on the two variables were gathered from a random sample of 20 sales districts. These data are available in a file named SALESAD3. Sales (DV) and advertising (IV) are both expressed in hundreds of dollars. (a) What is the sample regression equation relating sales to advertising? (b) Is there a linear relationship between sales and advertising? (c) What conclusion can be drawn from the test result? (d) Find the 95% confidence interval estimate for the mean value of DV given that IV = 410. (e) Find the 95% prediction interval for the individual value of DV given that IV = 410. (f) Construct a 95% confidence interval estimate of 1 . 28 Example 2 – SPSS OUTPUTS Model Summaryb Model 1 R R Square .930a Std. Error of the Estimate Adjusted R Square .864 .857 594.80820 a. Predictors: (Constant), adv b. Dependent Variable: sales Coefficientsa Unstandardized Coefficients Model 1 B (Constant) Std. Error -57.281 509.750 17.570 1.642 Adv Standardized Coefficients Beta 95% Confidence Interval for B t .930 Sig. Lower Bound Upper Bound -.112 .912 -1128.2 1013.7 10.702 .000 14.121 21.019 a. Dependent Variable: sales = -57.281 + 17.57x 29 Example 2 – Scatter Plot = -57.281 + 17.57x 30 The Coefficient of Determination In many regression problems, the major reason for constructing the regression equation is to obtain a tool that is useful in predicting the value of the dependent variable Y from some known value of the independent variable X. Thus, we often wish to assess the accuracy of the regression line in predicting the Y values. The R2 , called the coefficient of determination, provides a summary measure of how well the regression line fits the sample. It has a proportional reduction in error interpretation. That is, R2 is the proportion of the variability in the dependent variable that is explained by the independent variable (see the figure), namely, Sum of squares due to regression R2 = Total sum of squares 31 Partitioning Variation 32 Partitioning Variation (Cont.) The dependent variable Y can be partitioned into two parts explained variation by regression & unexplained variation. Total Variation Explained Variation Unexplained Variation The total sum of squares is SST = (Yi - )2 . The SS(Total) can be subdivided into two components: SSR = the sum of squares due to regression (explained variation) SSE = the sum of squares due to error (unexplained variation). That is, SST = SSR + SSE, namely, (Yi - )2 = ( i - )2 + (Yi - i)2 33 Computing Formulas The various sums of squares may be found more simply by using the following formulas. SST = SSyy=(Yi - )2 = (Yi)2 - (Yi )2/n SSR = ( i - )2 = b1 (SSxy) SSE = (Yi - i)2 = SS(Total) - SSR . Now we can calculate R2 by using the following equation R2=SSR/SS(Total) = 1 - SSE/SS(Total) = b1SSxy/SSyy and 0 R2 1. The computations are usually summarized in tabular form (ANOVA Table). 34 ANOVA Table ANOVA Table for Simple Regression While the t-test is used to test the significance of individual independent variables, the ANOVA Table provides an overall test of the significance of the whole set of independent variables. The test is an F-test with d.f. (k, n-k-1), where k is the number of independent variables in the model. F= MSR/MSE = [R2/k]/[(1-R2)/(n-k-1)] = (n-2)R2/(1-R2). For the simple linear regression model, the F-test is equivalent to the t-test for parameter 1 . But it is not the case for the multiple regression model. 35 Example 2 (Cont.) (a) Find SST, SSR, SSE, and R2 . (b) Present an ANOVA summary Table. (c) Test the hypothesis H0: 1 = 0 against Ha: 1 0 by using an F-statistic. Let = 0.05. Solution: ANOVAb Sum of Squares Model 1 Regression Residual Total df Mean Square 4.052E7 1 4.052E7 6368342.383 18 353796.799 4.689E7 19 F 114.539 Sig. .000a a. Predictors: (Constant), adv b. Dependent Variable: sales 36 Description of Methods of Regression: Case When X is Random For variable-X case, both X and Y are random variables measured on cases that are randomly selected from a population. The fixed-X regression model applies in this case when we treat the X values as if they were pre-selected. This technique is justifiable theoretically by conditioning on the X values that happened to be obtained in the sample (Textbook page 83). Therefore all the previous discussion and formulas are precisely the same for this case as for the fixed-X case. Since both X and Y are considered random variables, other parameters can be useful for describing the model, say, covariance of X and Y, denoted by XY (or Cov(X, Y)), and correlation coefficient, denoted by , which are measures of how the two variables vary together. 37 Correlation Coefficient The correlation coefficient = XY/XY is a measure of the direction and the strength of linear association between two variables. It is dimensionless, and it may take any value between - 1 and 1, inclusive. A positive correlation (i.e. > 0) means that as one variable increases, the other likewise increases. A negative correlation (i.e. < 0) means that as one variable increases, the other decreases. If = 0 for two variables, then we say that the variables are uncorrelated and that there is no linear association between them. Note that measures only linear relationship. The variables may be perfectly correlated in a curvilinear relationship, even = 0. 38 Correlation Coefficient and R2 The sample correlation coefficient r is an estimator for . The equation for the sample correlation coefficient is given as follows: r = SSxy/ [(SSxx)(SSyy)]1/2 . Simple regression techniques and correlation methods are related. In correlation, r is an estimator for the population correlation coefficient . In regression, r2 = R2 is simply a measure of closeness of fit. Thus the sample correlation coefficient r is used to estimate the direction and the strength of the linear relationship between two variables, whereas the coefficient of determination r2 = R2 is the proportion of the squared error that the regression equation can explain when we use the regression equation rather than the sample mean as a predictor. 39 Test of Coefficient of Correlation Note that tests of hypotheses and confidence intervals for the variable-X case require that X and Y be jointly normally distributed. That is, X and Y follow a bivariate normal distribution. Under the assumption mentioned above, we can test whether there is a linear relationship between X and Y variables (i.e. if = 0), by using the following t-test. The same conclusion as testing population slope 1 will be drawn. (1) H0: = 0 against Ha: 0 (or > 0 or < 0) (2) The test statistic is t = Under H0 the statistic t has the t-distribution with (n-2) degrees of freedom. 40 Example 2 (cont.) Use the data in the example to test that if there is a significant linear relationship between the sales and advertising expense (both in hundreds of dollars). Use = 0.05. Solution: (1) H0: = 0 against Ha: 0 (2) = 0.05, n = 20, df = n - 2 = 18 and t0.025, 18 = 2.101 (3) The rejection rule: If the |t| > 2.101, then reject the H0. (4) Computations: r =SSxy/(SSxxSSyy)1/2 = 0.9296 The test statistic is = = 10.701 (5) We reject H0 at = 0.05 since t = 10.701 > 2.101 and conclude that there is a significant linear relationship between the weekly usage and annual maintenance expense . 41 Further Examination of Computer Output Standardized Regression Coefficient The standardized regression coefficient is the slope in the regression equation if X and Y are standardized. After standardization the intercept in the regression equation will be zero, and for simple linear regression the standardized slop will be equal to the correlation coefficient r. In multiple regression, the standardized regression coefficients help quantify the relative contribution of each X variable. Coefficientsa Unstandardized Coefficients Model 1 B (Constant) Adv Std. Error Standardiz ed Coefficients Beta 95% Confidence Interval for B t -57.281 509.750 -.112 17.570 1.642 .930 10.702 a. Dependent Variable: sales Sig. Lower Bound .912 -1128.227 .000 14.121 Upper Bound r 1013.665 21.019 42 Checking for Violations of Assumptions We usually do not know in advance whether a linear regression model is appropriate for our data set. Therefore, it is necessary to conduct a search to check whether the necessary assumptions are violated. The analysis of the residuals is frequently helpful and useful tool for this purpose. The basic principles apply to all statistical models discussed in this course. Residuals: In model building, a residual is what is left after the model is fit. It is the difference between an observed value of Y and the predicted value of Y, i.e. Residuali = ei = (Yi - i). In regression analysis, the true errors are assumed to be independent normal variables with a mean of 0 and a constant variance of 2. If the model is appropriate for the data, the residuals ei, which are estimates of the true errors, should have similar characteristics. (Refer to Pages102~103) 43 Checking for Violations of Assumptions Identification of equality of variance Scatter plots can also be used to detect whether the assumption of constant variance of y for all values of x is being violated. If the spread of the residuals increases or decreases with the values of the independent variable or with the predicted values, then the assumption of homogeneity of variance is being violated. Identification of independence Usually this assumption is relative easy to meet since observations appear in a random position, and hence successive error terms are also likely to be random. However, in time series data or repeated measures data, this problem of dependence between successive error terms often occurs. 44 Checking for Violations of Assumptions (Cont.) Identification of normality A critical assumption of the simple linear regression model is that the error terms associated with each xi have a normal distribution. Note that it is unreasonable to expect the observed residuals to be exactly normal - some deviation is expected because of sampling variation. Even if the errors are normally distributed in the population, sample residuals are only approximately normal. Another way to compare the observed distribution of residuals to that expected under the assumption of normality is to plot the two cumulative distributions against each other for a series of points. If the two distributions are identical, a straight line results. It is called a P-P plot (a cumulative probability plot). 45 Checking for Violations of Assumptions (Cont.) Identification of linearity For the simple regression, a scatter plot gives a good indication of how well a straight line fits the data. Another convenient method is to plot the residuals against the predicted values. If the assumptions of linearity and homogeneity of variance are met, there should be no relationship between the predicted and residual values, i.e. the residuals should be randomly distributed around the horizontal line through zero. You should be suspicious of any observable pattern. Identification of outliers In combination with a scatter plot of the observed dependent and independent variables, the plot of residuals can be used to identify observations which appear to fall a long way from the normal cluster observations (a residual that is larger than 3s is an outlier). 46 Overview of Tests Involving Residuals Tests for Randomness in the Residuals Runs Test Tests for Autocorrelation in the Residuals in Time Order Durbin-Watson Test Tests for Normality Correlation Test (Shapiro-Wilk Test) Chi-Square Test Kolmogorov Test Tests for Constancy of Error Variance Brown-Forsythe (Modified Levene) Test* Cook-Weisberg (Breusch-Pagan) Test* F-test for Lack Of Fit Test whether a linear regression function is a good fit for the data*. (Note that the tests with * are valid only for large samples or under strong assumptions) 47 Overview of Remedial Measures If the linear regression normal error model is not appropriate for a data set, there are two basic choices Abandon the model and develop and use a more appropriate model ( non-normal, nonlinear models) Employ some transformation(s) on the data. Transformations Transformations for nonlinear relation Transformations for nonnormality and unequal variances Box-Cox Transformations 48 What to Watch Out For In the development of the theory for linear regression, the sample is assumed to be obtained randomly in such a way that it represents the whole population you are studying. Often, convenience samples, which are samples of easily available cases, are taken for economic or other reasons. It is likely to be an underestimate of the variance and possibly bias in the regression line. The lack of randomness in the sample can seriously invalidate our inferences. Confidence intervals are often optimistically narrow because the sample is not truly a random one from the whole population to which we wish to generalize. 49 What to Watch Out For (Cont.) Association versus Causality – A common mistake made when using regression analysis is to assume that a strong fit (high R2) of a regression of Y on X automatically means that “X causes Y” . (1) The reverse could be true: Y causes X (2) There may be third variable related to both X and Y. Forecasting Outside the range of the explanatory variables. 50 Matrix Approach to Simple Linear Regression Analysis yi = 0 + 1 xi + i , i = 1, 2, …, n This implies y1 = 0 + 1 x1 + 1 , y2 = 0 + 1 x2 + 2 , ……………………. yn = 0 + 1 xn + n , Let Yn1 = (y1, y2 , …, yn)’, Xn2 = [1n1 , (x1, x2, … xn)’], 21 = (0 , 1)’ and n1 = (1, 2 , …, n)’ . Then the normal model in matrix terms is as follows Yn1 = Xn2 21 + n1 or simply Y = X + where is a vector of independent normal variables with E( ) = 0 and Var() = Var(Y) = 2 I. 51 LS Estimation in Matrix Terms Normal Equations n b0 + b1 Xi = Yi b0 Xi +b1 Xi2 = Yi Xi in matrix terms are X’Xb = X’Y where b = (b0, b1)’. Estimated Regression Coefficients (X’X)-1 X’Xb = (X’X)-1 X’Y b = (X’X)-1 X’Y LSM in Matrix Notation Q = [Yi - ( 0 + 1 Xi)]2 = (Y - X)’(Y - X) = Y’Y - ’X’Y - Y’X + ’X’X = Y’Y - 2’X’Y + ’X’X (Q)/ = -2X’Y + 2X’X = [Q/0, Q/1]’ Equating to the zero vector, dividing by 2, and substituting b for , then, b = (X’X)-1 X’Y 52 Fitted Values and Residuals in Matrix Terms Fitted Values Residuals Variance-Covariance Matrix Var(e) = Var[(I - H)Y] = (I - H) Var(Y) (I - H)’ = (I - H) 2I (I - H)’ = 2 (I - H) and is estimated by s2(e) = MSE (I - H) 53 ANOVA in Matrix Terms SS(Total) = Yi2 - (Yi)2/n = Y’Y - Y’JY/n SSE = e’e = (Y - Xb)’(Y - Xb) = Y’Y - b’X’Y SSR = b’X’Y - Y’JY/n Note that Xb = HY and b’X’ = (Xb)’ = (HY)’ = Y’H, then SS(T) = Y’(I - J/n)Y = Y’A1Y SSE = Y’(I - H)Y = Y’A2Y SSR = Y’(H - J/n)Y = Y’A3Y Since A1, A2 and A3 are symmetric, SS(T), SSE and SSR are quadratic forms of the Yi. Quadratic forms play an important role in statistics because all sum of squares in the ANOVA for linear statistical models can be expressed as quadratic forms. 54 Inferences in Matrix Terms The variance covariance matrix Var(b) = 2 (X’X)-1 The estimated variance-covariance matrix of b is s2(b) = MSE (X’X)-1 Mean Response Let Xh = (1, xh)’ Var( ) = 2 Xh’(X’X)-1 Xh The estimated variance of in matrix notation is s2( ) = MSE(Xh’(X’X)-1 Xh) Prediction of New Observation s2(pred) = MSE(1+Xh’(X’X)-1 Xh) 55