Linear Models in Statistics (2nd Edition) PDF

LINEAR MODELS IN STATISTICS LINEAR MODELS IN STATISTICS Second Edition Alvin C. Rencher and G. Bruce Schaalje Department of Statistics, Brigham Young University, Provo, Utah Copyright # 2008 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley. com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Wiley Bicentennial Logo: Richard J. Pacifico Library of Congress Cataloging-in-Publication Data: Rencher, Alvin C., 1934- Linear models in statistics/Alvin C. Rencher, G. Bruce Schaalje. – 2nd ed. p. cm. Includes bibliographical references. ISBN 978-0-471-75498-5 (cloth) 1. Linear models (Statistics) I. Schaalje, G. Bruce. II. Title. QA276.R425 2007 519.50 35–dc22 2007024268 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 CONTENTS Preface xiii 1 Introduction 1 1.1 Simple Linear Regression Model 1 1.2 Multiple Linear Regression Model 2 1.3 Analysis-of-Variance Models 3 2 Matrix Algebra 5 2.1 Matrix and Vector Notation 5 2.1.1 Matrices, Vectors, and Scalars 5 2.1.2 Matrix Equality 6 2.1.3 Transpose 7 2.1.4 Matrices of Special Form 7 2.2 Operations 9 2.2.1 Sum of Two Matrices or Two Vectors 9 2.2.2 Product of a Scalar and a Matrix 10 2.2.3 Product of Two Matrices or Two Vectors 10 2.2.4 Hadamard Product of Two Matrices or Two Vectors 16 2.3 Partitioned Matrices 16 2.4 Rank 19 2.5 Inverse 21 2.6 Positive Definite Matrices 24 2.7 Systems of Equations 28 2.8 Generalized Inverse 32 2.8.1 Definition and Properties 33 2.8.2 Generalized Inverses and Systems of Equations 36 2.9 Determinants 37 2.10 Orthogonal Vectors and Matrices 41 2.11 Trace 44 2.12 Eigenvalues and Eigenvectors 46 2.12.1 Definition 46 2.12.2 Functions of a Matrix 49 v vi CONTENTS 2.12.3 Products 50 2.12.4 Symmetric Matrices 51 2.12.5 Positive Definite and Semidefinite Matrices 53 2.13 Idempotent Matrices 54 2.14 Vector and Matrix Calculus 56 2.14.1 Derivatives of Functions of Vectors and Matrices 56 2.14.2 Derivatives Involving Inverse Matrices and Determinants 58 2.14.3 Maximization or Minimization of a Function of a Vector 60 3 Random Vectors and Matrices 69 3.1 Introduction 69 3.2 Means, Variances, Covariances, and Correlations 70 3.3 Mean Vectors and Covariance Matrices for Random Vectors 75 3.3.1 Mean Vectors 75 3.3.2 Covariance Matrix 75 3.3.3 Generalized Variance 77 3.3.4 Standardized Distance 77 3.4 Correlation Matrices 77 3.5 Mean Vectors and Covariance Matrices for Partitioned Random Vectors 78 3.6 Linear Functions of Random Vectors 79 3.6.1 Means 80 3.6.2 Variances and Covariances 81 4 Multivariate Normal Distribution 87 4.1 Univariate Normal Density Function 87 4.2 Multivariate Normal Density Function 88 4.3 Moment Generating Functions 90 4.4 Properties of the Multivariate Normal Distribution 92 4.5 Partial Correlation 100 5 Distribution of Quadratic Forms in y 105 5.1 Sums of Squares 105 5.2 Mean and Variance of Quadratic Forms 107 5.3 Noncentral Chi-Square Distribution 112 5.4 Noncentral F and t Distributions 114 5.4.1 Noncentral F Distribution 114 5.4.2 Noncentral t Distribution 116 5.5 Distribution of Quadratic Forms 117 5.6 Independence of Linear Forms and Quadratic Forms 119 CONTENTS vii 6 Simple Linear Regression 127 6.1 The Model 127 6.2 Estimation of b0, b1, and s 2 128 6.3 Hypothesis Test and Confidence Interval for b1 132 6.4 Coefficient of Determination 133 7 Multiple Regression: Estimation 137 7.1 Introduction 137 7.2 The Model 137 7.3 Estimation of b and s 2 141 7.3.1 Least-Squares Estimator for b 145 7.3.2 Properties of the Least-Squares Estimator b̂ 141 7.3.3 An Estimator for s 2 149 7.4 Geometry of Least-Squares 151 7.4.1 Parameter Space, Data Space, and Prediction Space 152 7.4.2 Geometric Interpretation of the Multiple Linear Regression Model 153 7.5 The Model in Centered Form 154 7.6 Normal Model 157 7.6.1 Assumptions 157 7.6.2 Maximum Likelihood Estimators for b and s 2 158 7.6.3 Properties of b̂ and ŝ2 159 7.7 R 2 in Fixed-x Regression 161 7.8 Generalized Least-Squares: cov(y) ¼ s 2V 164 7.8.1 Estimation of b and s 2 when cov(y) ¼ s 2V 164 7.8.2 Misspecification of the Error Structure 167 7.9 Model Misspecification 169 7.10 Orthogonalization 174 8 Multiple Regression: Tests of Hypotheses and Confidence Intervals 185 8.1 Test of Overall Regression 185 8.2 Test on a Subset of the b Values 189 8.3 F Test in Terms of R 2 196 8.4 The General Linear Hypothesis Tests for H0: Cb ¼ 0 and H0: Cb ¼ t 198 8.4.1 The Test for H0: Cb ¼ 0 198 8.4.2 The Test for H0: Cb ¼ t 203 8.5 Tests on bj and a0 b 204 8.5.1 Testing One bj or One a0 b 204 8.5.2 Testing Several bj or a0 ib Values 205 viii CONTENTS 8.6 Confidence Intervals and Prediction Intervals 209 8.6.1 Confidence Region for b 209 8.6.2 Confidence Interval for bj 210 8.6.3 Confidence Interval for a0 b 211 8.6.4 Confidence Interval for E(y) 211 8.6.5 Prediction Interval for a Future Observation 213 8.6.6 Confidence Interval for s 2 215 8.6.7 Simultaneous Intervals 215 8.7 Likelihood Ratio Tests 217 9 Multiple Regression: Model Validation and Diagnostics 227 9.1 Residuals 227 9.2 The Hat Matrix 230 9.3 Outliers 232 9.4 Influential Observations and Leverage 235 10 Multiple Regression: Random x’s 243 10.1 Multivariate Normal Regression Model 244 10.2 Estimation and Testing in Multivariate Normal Regression 245 10.3 Standardized Regression Coefficents 249 10.4 R 2 in Multivariate Normal Regression 254 10.5 Tests and Confidence Intervals for R 2 258 10.6 Effect of Each Variable on R 2 262 10.7 Prediction for Multivariate Normal or Nonnormal Data 265 10.8 Sample Partial Correlations 266 11 Multiple Regression: Bayesian Inference 277 11.1 Elements of Bayesian Statistical Inference 277 11.2 A Bayesian Multiple Linear Regression Model 279 11.2.1 A Bayesian Multiple Regression Model with a Conjugate Prior 280 11.2.2 Marginal Posterior Density of b 282 11.2.3 Marginal Posterior Densities of t and s 2 284 11.3 Inference in Bayesian Multiple Linear Regression 285 11.3.1 Bayesian Point and Interval Estimates of Regression Coefficients 285 11.3.2 Hypothesis Tests for Regression Coefficients in Bayesian Inference 286 11.3.3 Special Cases of Inference in Bayesian Multiple Regression Models 286 11.3.4 Bayesian Point and Interval Estimation of s 2 287 CONTENTS ix 11.4 Bayesian Inference through Markov Chain Monte Carlo Simulation 288 11.5 Posterior Predictive Inference 290 12 Analysis-of-Variance Models 295 12.1 Non-Full-Rank Models 295 12.1.1 One-Way Model 295 12.1.2 Two-Way Model 299 12.2 Estimation 301 12.2.1 Estimation of b 302 12.2.2 Estimable Functions of b 305 12.3 Estimators 309 12.3.1 Estimators of l0 b 309 12.3.2 Estimation of s 2 313 12.3.3 Normal Model 314 12.4 Geometry of Least-Squares in the Overparameterized Model 316 12.5 Reparameterization 318 12.6 Side Conditions 320 12.7 Testing Hypotheses 323 12.7.1 Testable Hypotheses 323 12.7.2 Full-Reduced-Model Approach 324 12.7.3 General Linear Hypothesis 326 12.8 An Illustration of Estimation and Testing 329 12.8.1 Estimable Functions 330 12.8.2 Testing a Hypothesis 331 12.8.3 Orthogonality of Columns of X 333 13 One-Way Analysis-of-Variance: Balanced Case 339 13.1 The One-Way Model 339 13.2 Estimable Functions 340 13.3 Estimation of Parameters 341 13.3.1 Solving the Normal Equations 341 13.3.2 An Estimator for s 2 343 13.4 Testing the Hypothesis H0: m1 ¼ m2 ¼... ¼ mk 344 13.4.1 Full – Reduced-Model Approach 344 13.4.2 General Linear Hypothesis 348 13.5 Expected Mean Squares 351 13.5.1 Full-Reduced-Model Approach 352 13.5.2 General Linear Hypothesis 354 x CONTENTS 13.6 Contrasts 357 13.6.1 Hypothesis Test for a Contrast 357 13.6.2 Orthogonal Contrasts 358 13.6.3 Orthogonal Polynomial Contrasts 363 14 Two-Way Analysis-of-Variance: Balanced Case 377 14.1 The Two-Way Model 377 14.2 Estimable Functions 378 14.3 Estimators of l0 b and s 2 382 14.3.1 Solving the Normal Equations and Estimating l0 b 382 14.3.2 An Estimator for s 2 384 14.4 Testing Hypotheses 385 14.4.1 Test for Interaction 385 14.4.2 Tests for Main Effects 395 14.5 Expected Mean Squares 403 14.5.1 Sums-of-Squares Approach 403 14.5.2 Quadratic Form Approach 405 15 Analysis-of-Variance: The Cell Means Model for Unbalanced Data 413 15.1 Introduction 413 15.2 One-Way Model 415 15.2.1 Estimation and Testing 415 15.2.2 Contrasts 417 15.3 Two-Way Model 421 15.3.1 Unconstrained Model 421 15.3.2 Constrained Model 428 15.4 Two-Way Model with Empty Cells 432 16 Analysis-of-Covariance 443 16.1 Introduction 443 16.2 Estimation and Testing 444 16.2.1 The Analysis-of-Covariance Model 444 16.2.2 Estimation 446 16.2.3 Testing Hypotheses 448 16.3 One-Way Model with One Covariate 449 16.3.1 The Model 449 16.3.2 Estimation 449 16.3.3 Testing Hypotheses 450 CONTENTS xi 16.4 Two-Way Model with One Covariate 457 16.4.1 Tests for Main Effects and Interactions 458 16.4.2 Test for Slope 462 16.4.3 Test for Homogeneity of Slopes 463 16.5 One-Way Model with Multiple Covariates 464 16.5.1 The Model 464 16.5.2 Estimation 465 16.5.3 Testing Hypotheses 468 16.6 Analysis-of-Covariance with Unbalanced Models 473 17 Linear Mixed Models 479 17.1 Introduction 479 17.2 The Linear Mixed Model 479 17.3 Examples 481 17.4 Estimation of Variance Components 486 17.5 Inference for b 490 17.5.1 An Estimator for b 490 17.5.2 Large-Sample Inference for Estimable Functions of b 491 17.5.3 Small-Sample Inference for Estimable Functions of b 491 17.6 Inference for the ai Terms 497 17.7 Residual Diagnostics 501 18 Additional Models 507 18.1 Nonlinear Regression 507 18.2 Logistic Regression 508 18.3 Loglinear Models 511 18.4 Poisson Regression 512 18.5 Generalized Linear Models 513 Appendix A Answers and Hints to the Problems 517 References 653 Index 663 PREFACE In the second edition, we have added chapters on Bayesian inference in linear models (Chapter 11) and linear mixed models (Chapter 17), and have upgraded the material in all other chapters. Our continuing objective has been to introduce the theory of linear models in a clear but rigorous format. In spite of the availability of highly innovative tools in statistics, the main tool of the applied statistician remains the linear model. The linear model involves the sim- plest and seemingly most restrictive statistical properties: independence, normality, constancy of variance, and linearity. However, the model and the statistical methods associated with it are surprisingly versatile and robust. More importantly, mastery of the linear model is a prerequisite to work with advanced statistical tools because most advanced tools are generalizations of the linear model. The linear model is thus central to the training of any statistician, applied or theoretical. This book develops the basic theory of linear models for regression, analysis-of- variance, analysis–of–covariance, and linear mixed models. Chapter 18 briefly intro- duces logistic regression, generalized linear models, and nonlinear models. Applications are illustrated by examples and problems using real data. This combination of theory and applications will prepare the reader to further explore the literature and to more correctly interpret the output from a linear models computer package. This introductory linear models book is designed primarily for a one-semester course for advanced undergraduates or MS students. It includes more material than can be covered in one semester so as to give an instructor a choice of topics and to serve as a reference book for researchers who wish to gain a better understanding of regression and analysis-of-variance. The book would also serve well as a text for PhD classes in which the instructor is looking for a one-semester introduction, and it would be a good supplementary text or reference for a more advanced PhD class for which the students need to review the basics on their own. Our overriding objective in the preparation of this book has been clarity of expo- sition. We hope that students, instructors, researchers, and practitioners will find this linear models text more comfortable than most. In the final stages of development, we asked students for written comments as they read each day’s assignment. They made many suggestions that led to improvements in readability of the book. We are grateful to readers who have notified us of errors and other suggestions for improvements of the text, and we will continue to be very grateful to readers who take the time to do so for this second edition. xiii xiv PREFACE Another objective of the book is to tie up loose ends. There are many approaches to teaching regression, for example. Some books present estimation of regression coefficients for fixed x’s only, other books use random x’s, some use centered models, and others define estimated regression coefficients in terms of variances and covariances or in terms of correlations. Theory for linear models has been pre- sented using both an algebraic and a geometric approach. Many books present clas- sical (frequentist) inference for linear models, while increasingly the Bayesian approach is presented. We have tried to cover all these approaches carefully and to show how they relate to each other. We have attempted to do something similar for various approaches to analysis-of-variance. We believe that this will make the book useful as a reference as well as a textbook. An instructor can choose the approach he or she prefers, and a student or researcher has access to other methods as well. The book includes a large number of theoretical problems and a smaller number of applied problems using real datasets. The problems, along with the extensive set of answers in Appendix A, extend the book in two significant ways: (1) the theoretical problems and answers fill in nearly all gaps in derivations and proofs and also extend the coverage of material in the text, and (2) the applied problems and answers become additional examples illustrating the theory. As instructors, we find that having answers available for the students saves a great deal of class time and enables us to cover more material and cover it better. The answers would be especially useful to a reader who is engaging this material outside the formal classroom setting. The mathematical prerequisites for this book are multivariable calculus and matrix algebra. The review of matrix algebra in Chapter 2 is intended to be sufficiently com- plete so that the reader with no previous experience can master matrix manipulation up to the level required in this book. Statistical prerequisites include some exposure to statistical theory, with coverage of topics such as distributions of random variables, expected values, moment generating functions, and an introduction to estimation and testing hypotheses. These topics are briefly reviewed as each is introduced. One or two statistical methods courses would also be helpful, with coverage of topics such as t tests, regression, and analysis-of-variance. We have made considerable effort to maintain consistency of notation throughout the book. We have also attempted to employ standard notation as far as possible and to avoid exotic characters that cannot be readily reproduced on the chalkboard. With a few exceptions, we have refrained from the use of abbreviations and mnemonic devices. We often find these annoying in a book or journal article. Equations are numbered sequentially throughout each chapter; for example, (3.29) indicates the twenty-ninth numbered equation in Chapter 3. Tables and figures are also numbered sequentially throughout each chapter in the form “Table 7.4” or “Figure 3.2.” On the other hand, examples and theorems are numbered sequentially within a section, for example, Theorems 2.2a and 2.2b. The solution of most of the problems with real datasets requires the use of the com- puter. We have not discussed command files or output of any particular program, because there are so many good packages available. Computations for the numerical examples and numerical problems were done with SAS. The datasets and SAS PREFACE xv command files for all the numerical examples and problems in the text are available on the Internet; see Appendix B. The references list is not intended to be an exhaustive survey of the literature. We have provided original references for some of the basic results in linear models and have also referred the reader to many up-to-date texts and reference books useful for further reading. When citing references in the text, we have used the standard format involving the year of publication. For journal articles, the year alone suffices, for example, Fisher (1921). But for a specific reference in a book, we have included a page number or section, as in Hocking (1996, p. 216). Our selection of topics is intended to prepare the reader for a better understanding of applications and for further reading in topics such as mixed models, generalized linear models, and Bayesian models. Following a brief introduction in Chapter 1, Chapter 2 contains a careful review of all aspects of matrix algebra needed to read the book. Chapters 3, 4, and 5 cover properties of random vectors, matrices, and quadratic forms. Chapters 6, 7, and 8 cover simple and multiple linear regression, including estimation and testing hypotheses and consequences of misspecification of the model. Chapter 9 provides diagnostics for model validation and detection of influential observations. Chapter 10 treats multiple regression with random x’s. Chapter 11 covers Bayesian multiple linear regression models along with Bayesian inferences based on those models. Chapter 12 covers the basic theory of analysis- of-variance models, including estimability and testability for the overparameterized model, reparameterization, and the imposition of side conditions. Chapters 13 and 14 cover balanced one-way and two-way analysis-of-variance models using an over- parameterized model. Chapter 15 covers unbalanced analysis-of-variance models using a cell means model, including a section on dealing with empty cells in two- way analysis-of-variance. Chapter 16 covers analysis of covariance models. Chapter 17 covers the basic theory of linear mixed models, including residual maximum likelihood estimation of variance components, approximate small- sample inferences for fixed effects, best linear unbiased prediction of random effects, and residual analysis. Chapter 18 introduces additional topics such as nonlinear regression, logistic regression, loglinear models, Poisson regression, and generalized linear models. In our class for first-year master’s-level students, we cover most of the material in Chapters 2 – 5, 7 – 8, 10 – 12, and 17. Many other sequences are possible. For example, a thorough one-semester regression and analysis-of-variance course could cover Chapters 1 – 10, and 12 – 15. Al’s introduction to linear models came in classes taught by Dale Richards and Rolf Bargmann. He also learned much from the books by Graybill, Scheffé, and Rao. Al expresses thanks to the following for reading the first edition manuscript and making many valuable suggestions: David Turner, John Walker, Joel Reynolds, and Gale Rex Bryce. Al thanks the following students at Brigham Young University (BYU) who helped with computations, graphics, and typing of the first edition: David Fillmore, Candace Baker, Scott Curtis, Douglas Burton, David Dahl, Brenda Price, Eric Hintze, James Liechty, and Joy Willbur. The students xvi PREFACE in Al’s Linear Models class went through the manuscript carefully and spotted many typographical errors and passages that needed additional clarification. Bruce’s education in linear models came in classes taught by Mel Carter, Del Scott, Doug Martin, Peter Bloomfield, and Francis Giesbrecht, and influential short courses taught by John Nelder and Russ Wolfinger. We thank Bruce’s Linear Models classes of 2006 and 2007 for going through the book and new chapters. They made valuable suggestions for improvement of the text. We thank Paul Martin and James Hattaway for invaluable help with LaTex. The Department of Statistics, Brigham Young University provided financial support and encouragement throughout the project. Second Edition For the second edition we added Chapter 11 on Bayesian inference in linear models (including Gibbs sampling) and Chapter 17 on linear mixed models. We also added a section in Chapter 2 on vector and matrix calculus, adding several new theorems and covering the Lagrange multiplier method. In Chapter 4, we pre- sented a new proof of the conditional distribution of a subvector of a multivariate normal vector. In Chapter 5, we provided proofs of the moment generating function and variance of a quadratic form of a multivariate normal vector. The section on the geometry of least squares was completely rewritten in Chapter 7, and a section on the geometry of least squares in the overparameterized linear model was added to Chapter 12. Chapter 8 was revised to provide more motivation for hypothesis testing and simultaneous inference. A new section was added to Chapter 15 dealing with two-way analysis-of-variance when there are empty cells. This material is not available in any other textbook that we are aware of. This book would not have been possible without the patience, support, and encouragement of Al’s wife LaRue and Bruce’s wife Lois. Both have helped and sup- ported us in more ways than they know. This book is dedicated to them. ALVIN C. RENCHER AND G. BRUCE SCHAALJE Department of Statistics Brigham Young University Provo, Utah 1 Introduction The scientific method is frequently used as a guided approach to learning. Linear statistical methods are widely used as part of this learning process. In the biological, physical, and social sciences, as well as in business and engineering, linear models are useful in both the planning stages of research and analysis of the resulting data. In Sections 1.1– 1.3, we give a brief introduction to simple and multiple linear regression models, and analysis-of-variance (ANOVA) models. 1.1 SIMPLE LINEAR REGRESSION MODEL In simple linear regression, we attempt to model the relationship between two vari- ables, for example, income and number of years of education, height and weight of people, length and width of envelopes, temperature and output of an industrial process, altitude and boiling point of water, or dose of a drug and response. For a linear relationship, we can use a model of the form y ¼ b0 þ b1 x þ 1, (1:1) where y is the dependent or response variable and x is the independent or predictor variable. The random variable 1 is the error term in the model. In this context, error does not mean mistake but is a statistical term representing random fluctuations, measurement errors, or the effect of factors outside of our control. The linearity of the model in (1.1) is an assumption. We typically add other assumptions about the distribution of the error terms, independence of the observed values of y, and so on. Using observed values of x and y, we estimate b0 and b1 and make inferences such as confidence intervals and tests of hypotheses for b0 and b1. We may also use the estimated model to forecast or predict the value of y for a particular value of x, in which case a measure of predictive accuracy may also be of interest. Estimation and inferential procedures for the simple linear regression model are developed and illustrated in Chapter 6. Linear Models in Statistics, Second Edition, by Alvin C. Rencher and G. Bruce Schaalje Copyright # 2008 John Wiley & Sons, Inc. 1 2 INTRODUCTION 1.2 MULTIPLE LINEAR REGRESSION MODEL The response y is often influenced by more than one predictor variable. For example, the yield of a crop may depend on the amount of nitrogen, potash, and phosphate fer- tilizers used. These variables are controlled by the experimenter, but the yield may also depend on uncontrollable variables such as those associated with weather. A linear model relating the response y to several predictors has the form y ¼ b0 þ b1 x1 þ b2 x2 þ þ bk xk þ 1: (1:2) The parameters b0 , b1 ,... , bk are called regression coefficients. As in (1.1), 1 provides for random variation in y not explained by the x variables. This random variation may be due partly to other variables that affect y but are not known or not observed. The model in (1.2) is linear in the b parameters; it is not necessarily linear in the x variables. Thus models such as y ¼ b0 þ b1 x1 þ b2 x21 þ b3 x2 þ b4 sin x2 þ 1 are included in the designation linear model. A model provides a theoretical framework for better understanding of a pheno- menon of interest. Thus a model is a mathematical construct that we believe may represent the mechanism that generated the observations at hand. The postulated model may be an idealized oversimplification of the complex real-world situation, but in many such cases, empirical models provide useful approximations of the relationships among variables. These relationships may be either associative or causative. Regression models such as (1.2) are used for various purposes, including the following: 1. Prediction. Estimates of the individual parameters b0 , b1 ,... , bk are of less importance for prediction than the overall influence of the x variables on y. However, good estimates are needed to achieve good prediction performance. 2. Data Description or Explanation. The scientist or engineer uses the estimated model to summarize or describe the observed data. 3. Parameter Estimation. The values of the estimated parameters may have theoretical implications for a postulated model. 4. Variable Selection or Screening. The emphasis is on determining the import- ance of each predictor variable in modeling the variation in y. The predictors that are associated with an important amount of variation in y are retained; those that contribute little are deleted. 5. Control of Output. A cause-and-effect relationship between y and the x variables is assumed. The estimated model might then be used to control the 1.3 ANALYSIS-OF-VARIANCE MODELS 3 output of a process by varying the inputs. By systematic experimentation, it may be possible to achieve the optimal output. There is a fundamental difference between purposes 1 and 5. For prediction, we need only assume that the same correlations that prevailed when the data were collected also continue in place when the predictions are to be made. Showing that there is a significant relationship between y and the x variables in (1.2) does not necessarily prove that the relationship is causal. To establish causality in order to control output, the researcher must choose the values of the x variables in the model and use randomization to avoid the effects of other possible variables unaccounted for. In other words, to ascertain the effect of the x variables on y when the x variables are changed, it is necessary to change them. Estimation and inferential procedures that contribute to the five purposes listed above are discussed in Chapters 7 – 11. 1.3 ANALYSIS-OF-VARIANCE MODELS In analysis-of-variance (ANOVA) models, we are interested in comparing several populations or several conditions in a study. Analysis-of-variance models can be expressed as linear models with restrictions on the x values. Typically the x’s are 0s or 1s. For example, suppose that a researcher wishes to compare the mean yield for four types of catalyst in an industrial process. If n observations are to be obtained for each catalyst, one model for the 4n observations can be expressed as yij ¼ mi þ 1ij , i ¼ 1, 2, 3, 4, j ¼ 1, 2,... , n, (1:3) where mi is the mean corresponding to the ith catalyst. A hypothesis of interest is H0 : m1 ¼ m2 ¼ m3 ¼ m4. The model in (1.3) can be expressed in the alternative form yij ¼ m þ ai þ 1ij , i ¼ 1, 2, 3, 4, j ¼ 1, 2,... , n: (1:4) In this form, ai is the effect of the ith catalyst, and the hypothesis can be expressed as H0 : a1 ¼ a2 ¼ a3 ¼ a4. Suppose that the researcher also wishes to compare the effects of three levels of temperature and that n observations are taken at each of the 12 catalyst– temperature combinations. Then the model can be expressed as yijk ¼ mij þ 1ijk ¼ m þ ai þ bj þ gij þ 1ijk (1:5) i ¼ 1, 2, 3, 4; j ¼ 1, 2, 3; k ¼ 1, 2,... , n, where mij is the mean for the ijth catalyst – temperature combination, ai is the effect of the ith catalyst, bj is the effect of the jth level of temperature, and gij is the interaction or joint effect of the ith catalyst and jth level of temperature. 4 INTRODUCTION In the examples leading to models (1.3) – (1.5), the researcher chooses the type of catalyst or level of temperature and thus applies different treatments to the objects or experimental units under study. In other settings, we compare the means of variables measured on natural groupings of units, for example, males and females or various geographic areas. Analysis-of-variance models can be treated as a special case of regression models, but it is more convenient to analyze them separately. This is done in Chapters 12– 15. Related topics, such as analysis-of-covariance and mixed models, are covered in Chapters 16– 17. 2 Matrix Algebra If we write a linear model such as (1.2) for each of n observations in a dataset, the n resulting models can be expressed in a single compact matrix expression. Then the estimation and testing results can be more easily obtained using matrix theory. In the present chapter, we review the elements of matrix theory needed in the remainder of the book. Proofs that seem instructive are included or called for in the problems. For other proofs, see Graybill (1969), Searle (1982), Harville (1997), Schott (1997), or any general text on matrix theory. We begin with some basic defi- nitions in Section 2.1. 2.1 MATRIX AND VECTOR NOTATION 2.1.1 Matrices, Vectors, and Scalars A matrix is a rectangular or square array of numbers or variables. We use uppercase boldface letters to represent matrices. In this book, all elements of matrices will be real numbers or variables representing real numbers. For example, the height (in inches) and weight (in pounds) for three students are listed in the following matrix: 0 1 65 154 A ¼ @ 73 182 A: (2:1) 68 167 To represent the elements of A as variables, we use 0 1 a11 a12 A ¼ (aij ) ¼ @ a21 a22 A: (2:2) a31 a32 The first subscript in aij indicates the row; the second identifies the column. The nota- tion A ¼ (aij ) represents a matrix by means of a typical element. Linear Models in Statistics, Second Edition, by Alvin C. Rencher and G. Bruce Schaalje Copyright # 2008 John Wiley & Sons, Inc. 5 6 MATRIX ALGEBRA The matrix A in (2.1) or (2.2) has three rows and two columns, and we say that A is 3 2, or that the size of A is 3 2. A vector is a matrix with a single row or column. Elements in a vector are often identified by a single subscript; for example 0 1 x1 x ¼ @ x2 A: x3 As a convention, we use lowercase boldface letters for column vectors and lowercase boldface letters followed by the prime symbol (0 ) for row vectors; for example x0 ¼ (x1 , x2 , x3 ) ¼ (x1 x2 x3 ): (Row vectors are regarded as transposes of column vectors. The transpose is defined in Section 2.1.3 below). We use either commas or spaces to separate elements of a row vector. Geometrically, a row or column vector with p elements can be associated with a point in a p-dimensional space. The elements in the vector are the coordinates of the point. Sometimes we are interested in the distance from the origin to the point (vector), the distance between two points (vectors), or the angle between the arrows drawn from the origin to the two points. In the context of matrices and vectors, a single real number is called a scalar. Thus 2.5, 29, and 7.26 are scalars. A variable representing a scalar will be denoted by a lightface letter (usually lowercase), such as c. A scalar is technically distinct from a 1 1 matrix in terms of its uses and properties in matrix algebra. The same notation is often used to represent a scalar and a 1 1 matrix, but the meaning is usually obvious from the context. 2.1.2 Matrix Equality Two matrices or two vectors are equal if they are of the same size and if the elements in corresponding positions are equal; for example 3 2 4 3 2 4 ¼ , 1 3 7 1 3 7 but 5 2 9 5 3 9 = : 8 4 6 8 4 6 2.1 MATRIX AND VECTOR NOTATION 7 2.1.3 Transpose If we interchange the rows and columns of a matrix A, the resulting matrix is known as the transpose of A and is denoted by A0 ; for example 0 1 6 2 6 4 1 A ¼ @4 7 A, 0 A ¼ : 2 7 3 1 3 Formally, if A is denoted by A ¼ (aij ), then A0 is defined as A0 ¼ (aij )0 ¼ (a ji ): (2:3) This notation indicates that the element in the ith row and jth column of A is found in the jth row and ith column of A0. If the matrix A is n p, then A0 is p n. If a matrix is transposed twice, the result is the original matrix. Theorem 2.1. If A is any matrix, then (A0 )0 ¼ A: (2:4) PROOF. By (2.3), A0 ¼ (aij )0 ¼ (a ji ): Then (A0 )0 ¼ (a ji )0 ¼ (aij ) ¼ A. A (The notation A is used to indicate the end of a theorem proof, corollary proof or example.) 2.1.4 Matrices of Special Form If the transpose of a matrix A is the same as the original matrix, that is, if A0 ¼ A or equivalently (a ji ) ¼ (aij ), then the matrix A is said to be symmetric. For example 0 1 3 2 6 A ¼ @ 2 10 7 A 6 7 9 is symmetric. Clearly, all symmetric matrices are square. The diagonal of a p p square matrix A ¼ (aij ) consists of the elements a11 , a22 ,... , a pp. If a matrix contains zeros in all off-diagonal positions, it is said 8 MATRIX ALGEBRA to be a diagonal matrix; for example, consider the matrix 0 1 8 0 0 0 B0 3 0 0C D¼B @0 C, 0 0 0A 0 0 0 4 which can also be denoted as D ¼ diag(8, 3, 0, 4): We also use the notation diag(A) to indicate a diagonal matrix with the same diagonal elements as A; for example 0 1 0 1 3 2 6 3 0 0 A ¼ @2 10 7 A, diag(A) ¼ @ 0 10 0 A: 6 7 9 0 0 9 A diagonal matrix with a 1 in each diagonal position is called an identity matrix, and is denoted by I; for example 0 1 1 0 0 I ¼ @0 1 0 A: (2:5) 0 0 1 An upper triangular matrix is a square matrix with zeros below the diagonal; for example, 0 1 7 2 3 5 B0 0 2 6 C T¼B @0 C: 0 4 1A 0 0 0 8 A lower triangular matrix is defined similarly. A vector of 1s is denoted by j: 0 1 1 B1C B C j ¼ B.. C: (2:6) @.A 1 2.2 OPERATIONS 9 A square matrix of 1s is denoted by J; for example 0 1 1 1 1 J ¼ @1 1 1 A: (2:7) 1 1 1 We denote a vector of zeros by 0 and a matrix of zeros by O; for example 0 1 0 1 0 0 0 0 0 0 ¼ @ 0 A, O ¼ @0 0 0 0 A: (2:8) 0 0 0 0 0 2.2 OPERATIONS We now define sums and products of matrices and vectors and consider some pro- perties of these sums and products. 2.2.1 Sum of Two Matrices or Two Vectors If two matrices or two vectors are the same size, they are said to be conformal for addition. Their sum is found by adding corresponding elements. Thus, if A is n p and B is n p, then C ¼ A þ B is also n p and is found as C ¼ (cij ) ¼ (aij þ bij ); for example 7 3 4 11 5 6 18 2 2 þ ¼ : 2 8 5 3 4 2 5 12 3 The difference D ¼ A B between two conformal matrices A and B is defined simi- larly: D ¼ (dij ) ¼ (aij bij ). Two properties of matrix addition are given in the following theorem. Theorem 2.2a. If A and B are both n m, then (i) A þ B ¼ B þ A: (2.9) (ii) (A þ B)0 ¼ A0 þ B0 : (2.10) A 10 MATRIX ALGEBRA 2.2.2 Product of a Scalar and a Matrix Any scalar can be multiplied by any matrix. The product of a scalar and a matrix is defined as the product of each element of the matrix and the scalar: 0 1 ca11 ca12 ca1m B ca21 ca22 ca2m C B C cA ¼ (caij ) ¼ B..... C : (2:11) @.... A can1 can2 canm Since caij ¼ aij c, the product of a scalar and a matrix is commutative: cA ¼ Ac: (2:12) 2.2.3 Product of Two Matrices or Two Vectors In order for the product AB to be defined, the number of columns in A must equal the number of rows in B, in which case A and B are said to be conformal for multipli- cation. Then the (ij)th element of the product C ¼ AB is defined as X cij ¼ aik bkj , (2:13) k which is the sum of products of the elements in the ith row of A and the elements in the jth column of B. Thus we multiply every row of A by every column of B. If A is n m and B is m p, then C ¼ AB is n p. We illustrate matrix multiplication in the following example. Example 2.2.3. Let 0 1 1 4 2 1 3 A¼ and B ¼ @ 2 6 A: 4 6 5 3 8 Then 21þ12þ33 24þ16þ38 13 38 AB ¼ ¼ , 41þ62þ53 44þ66þ58 31 92 0 1 18 25 23 B C BA ¼ @ 28 38 36 A: 38 51 49 A Note that a 1 1 matrix A can only be multiplied on the right by a 1 n matrix B or on the left by an n 1 matrix C, whereas a scalar can be multiplied on the right or left by a matrix of any size. 2.2 OPERATIONS 11 If A is n m and B is m p, where n = p, then AB is defined, but BA is not defined. If A is n p and B is p n, then AB is n n and BA is p p. In this case, of course, AB = BA, as illustrated in Example 2.2.3. If A and B are both n n, then AB and BA are the same size, but, in general AB = BA: (2:14) [There are a few exceptions to (2.14), for example, two diagonal matrices or a square matrix and an identity.] Thus matrix multiplication is not commutative, and certain familiar manipulations with real numbers cannot be done with matrices. However, matrix multiplication is distributive over addition or subtraction: A(B + C) ¼ AB + AC, (2:15) (A + B)C ¼ AC + BC: (2:16) Using (2.15) and (2.16), we can expand products such as (A B)(C D): (A B)(C D) ¼ (A B)C (A B)D [by (2:15)] ¼ AC BC AD þ BD [by (2:16)]: (2:17) Multiplication involving vectors follows the same rules as for matrices. Suppose that A is n p, b is p 1, c is p 1, and d is n 1. Then Ab is a column vector of size n 1, d0 A is a row vector of size 1 p, b0 c is a sum of products (1 1), bc0 is a p p matrix, and cd0 is a p n matrix. Since b0 c is a 1 1 sum of products, it is equal to c0 b: b0 c ¼ b1 c1 þ b2 c2 þ þ bp cp , c0 b ¼ c1 b1 þ c2 b2 þ þ cp bp , b0 c ¼ c0 b: (2:18) The matrix cd0 is given by 0 1 c1 d1 c1 d2 c1 dn B c2 d1 c2 d2 c2 dn C B C cd0 ¼ B..... C : (2:19) @.... A cp d1 cp d2 cp dn 12 MATRIX ALGEBRA Similarly b0 b ¼ b21 þ b22 þ þ b2p , (2:20) 0 1 b21 b1 b2 b1 bp B C B b2 b1 b22 b2 bp C B C bb0 ¼ B..... C : (2:21) B.... C @ A bp b1 bp b2 b2p Thus, b0 b is a sum of squares and bb0 is a (symmetric) square matrix. The square root of the sum of squares of the elements of a p 1 vector b is the distance from the origin to the point b and is also referred to as the length of b: sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃ X p Length of b ¼ b0 b ¼ b2i : (2:22) i¼1 If j is an n 1 vector of 1s as defined in (2.6), then by (2.20) and (2.21), we have 0 1 1 1 1 B1 1 1C B C j0 j ¼ n, jj0 ¼ B...... C ¼ J, (2:23) @...A 1 1 1 where J is an n n square matrix of 1s as illustrated in (2.7). If a is n 1 and A is n p, then X n a0 j ¼ j0 a ¼ ai , (2:24) i¼1 0P 1 j a1j X BP C X X B j a2j C B C j0 A ¼ ai1 , ai2 ,... , aip , Aj ¼ B. C: (2:25) B.. C i i i @ A P j anj Thus a0 j is the sum of the elements in a, j0 A contains the column sums of A, and Aj contains the row sums of A. Note that in a0 j, the vector j is n 1; in j0 A, the vector j is n 1; and in Aj, the vector j is p 1. 2.2 OPERATIONS 13 The transpose of the product of two matrices is the product of the transposes in reverse order. Theorem 2.2b. If A is n p and B is p m, then (AB)0 ¼ B0 A0 : (2:26) PROOF. Let C ¼ AB: Then by (2.13) ! X p C ¼ (cij ) ¼ aik bkj : k¼1 By (2.3), the transpose of C ¼ AB becomes (AB)0 ¼ C0 ¼ (cij )0 ¼ (c ji ) ! ! X p X p ¼ a jk bki ¼ bki a jk ¼ B0 A0 : A k¼1 k¼1 We illustrate the steps in the proof of Theorem 2.2b using a 2 3 matrix A and a 3 2 matrix B: 0 1 b11 b12 a11 a12 a13 B C AB ¼ @ b21 b22 A a21 a22 a23 b31 b32 a11 b11 þ a12 b21 þ a13 b31 a11 b12 þ a12 b22 þ a13 b32 ¼ , a21 b11 þ a22 b21 þ a23 b31 a21 b12 þ a22 b22 þ a23 b32 0 a11 b11 þ a12 b21 þ a13 b31 a21 b11 þ a22 b21 þ a23 b31 (AB) ¼ a11 b12 þ a12 b22 þ a13 b32 a21 b12 þ a22 b22 þ a23 b32 b11 a11 þ b21 a12 þ b31 a13 b11 a21 þ b21 a22 þ b31 a23 ¼ b12 a11 þ b22 a12 þ b32 a13 b12 a21 þ b22 a22 þ b32 a23 0 1 a11 a21 b11 b21 b31 B C ¼ @ a12 a22 A b12 b22 b32 a13 a23 ¼ B 0 A0 : 14 MATRIX ALGEBRA The following corollary to Theorem 2.2b gives the transpose of the product of three matrices. Corollary 1. If A, B, and C are conformal so that ABC is defined, then (ABC)0 ¼ C0 B0 A0. A Suppose that A is n m and B is m p. Let a0i be the ith row of A and bj be the jth column of B, so that 0 1 a01 B a02 C B C A ¼ B. C, B ¼ (b1 , b2 ,... , bp ): @.. A a0n Then, by definition, the (ij)th element of AB is a0i bj : 0 1 a01 b1 a01 b2 a01 bp B a02 b1 a02 b2 a02 bp C B C AB ¼ B..... C : @.... A a0n b1 a0n b2 a0n bp This product can be written in terms of the rows of A: 0 1 0 0 1 0 01 a01 (b1 , b2 ,... , bp ) a1 B a1 B a02 (b1 , b2 ,... , bp ) C B a02 B C B a02 C B C B C B C AB ¼ B.. C ¼ B. C ¼ B. CB: (2:27) @. A @. A @.. A. 0 0 an (b1 , b2 ,... , bp ) an B a0n The first column of AB can be expressed in terms of A as 0 1 0 01 a01 b1 a1 B a02 b1 C B a02 C B C B C B. C ¼ B. Cb1 ¼ Ab1 : @.. A @.. A a0n b1 a0n Likewise, the second column is Ab2, and so on. Thus AB can be written in terms of the columns of B: AB ¼ A(b1 , b2 ,... , bp ) ¼ (Ab1 , Ab2 ,... , Abp ): (2:28) 2.2 OPERATIONS 15 Any matrix A can be multiplied by its transpose to form A0 A or AA0. Some pro- perties of these two products are given in the following theorem. Theorem 2.2c. Let A be any n p matrix. Then A0 A and AA0 have the following properties. (i) A0 A is p p and its elements are products of the columns of A. (ii) AA0 is n n and its elements are products of the rows of A. (iii) Both A0 A and AA0 are symmetric. (iv) If A0 A ¼ O, then A ¼ O. A Let A be an n n matrix and let D ¼ diag(d1 , d2 ,... , dn ). In the product DA, the ith row of A is multiplied by di, and in AD, the jth column of A is multiplied by dj. For example, if n ¼ 3, we have 0 10 1 d1 0 0 a11 a12 a13 B CB C B CB C DA ¼ B 0 d2 0 CB a21 a22 a23 C @ A@ A 0 0 d3 a31 a32 a33 0 1 d1 a11 d1 a12 d1 a13 B C B C ¼ B d2 a21 d2 a22 d2 a23 C, (2:29) @ A d3 a31 d3 a32 d3 a33 0 10 1 a11 a12 a13 d1 0 0 B CB C B CB C AD ¼ B a21 a22 a23 CB 0 d2 0C @ A@ A a31 a32 a33 0 0 d3 0 1 d1 a11 d2 a12 d3 a13 B C B C ¼ B d1 a21 d2 a22 d3 a23 C, (2:30) @ A d1 a31 d2 a32 d3 a33 0 1 d12 a11 d1 d2 a12 d1 d3 a13 B C B C DAD ¼ B d2 d1 a21 d22 a22 d2 d3 a23 C: (2:31) @ A 2 d3 d1 a31 d3 d2 a32 d3 a33 16 MATRIX ALGEBRA Note that DA = AD. However, in the special case where the diagonal matrix is the identity, (2.29) and (2.30) become IA ¼ AI ¼ A: (2:32) If A is rectangular, (2.32) still holds, but the two identities are of different sizes. If A is a symmetric matrix and y is a vector, the product X X y0 Ay ¼ aii y2i þ aij yi yj (2:33) i i=j is called a quadratic form. If x is n 1, y is p 1, and A is n p, the product X x0 Ay ¼ aij xi yj (2:34) ij is called a bilinear form. 2.2.4 Hadamard Product of Two Matrices or Two Vectors Sometimes a third type of product, called the elementwise or Hadamard product, is useful. If two matrices or two vectors are of the same size (conformal for addition), the Hadamard product is found by simply multiplying corresponding elements: 0 1 a11 b11 a12 b12 a1p b1p B a21 b21 a22 b22 a2p b2p C B C (aij bij ) ¼ B..... C : @.... A an1 bn1 an2 bn2 anp bnp 2.3 PARTITIONED MATRICES It is sometimes convenient to partition a matrix into submatrices. For example, a par- titioning of a matrix A into four (square or rectangular) submatrices of appropriate sizes can be indicated symbolically as follows: A11 A12 A¼ : A21 A22 2.3 PARTITIONED MATRICES 17 To illustrate, let the 4 5 matrix A be partitioned as 0 1 B 7 2 5 8 4 C B C B 3 B 4 0 2 7 CC A11 A12 A¼B C¼ , B C A21 A22 B 9 3 6 5 2 C @ A 3 1 2 1 6 where 7 2 5 8 4 A11 ¼ , A12 ¼ , 3 4 0 2 7 9 3 6 5 2 A21 ¼ , A22 ¼ : 3 1 2 1 6 If two matrices A and B are conformal for multiplication, and if A and B are parti- tioned so that the submatrices are appropriately conformal, then the product AB can be found using the usual pattern of row by column multiplication with the subma- trices as if they were single elements; for example A11 A12 B11 B12 AB ¼ A21 A22 B21 B22 A11 B11 þ A12 B21 A11 B12 þ A12 B22 ¼ : (2:35) A21 B11 þ A22 B21 A21 B12 þ A22 B22 If B is replaced by a vector b partitioned into two sets of elements, and if A is correspondingly partitioned into two sets of columns, then (2.35) becomes b Ab ¼ (A1 , A2 ) 1 ¼ A1 b 1 þ A 2 b 2 , (2:36) b2 where the number of columns of A1 is equal to the number of elements of b1, and A2 and b2 are similarly conformal. Note that the partitioning in A ¼ (A1 , A2 ) is indicated by a comma. The partitioned multiplication in (2.36) can be extended to individual columns of A and individual elements of b: 0 1 b1 B b2 C B C Ab ¼ (a1 , a2 ,... , ap )B. C ¼ b1 a1 þ b2 a2 þ þ bp ap : (2:37) @.. A bp 18 MATRIX ALGEBRA Thus Ab is expressible as a linear combination of the columns of A, in which the coefficients are elements of b. We illustrate (2.37) in the following example. Example 2.3. Let 0 1 0 1 6 2 3 4 A ¼ @2 1 0 A, b ¼ @ 2 A: 4 3 2 1 Then 0 1 17 Ab ¼ @ 10 A: 20 Using a linear combination of columns of A as in (2.37), we obtain Ab ¼ b1 a1 þ b2 a2 þ b2 a3 0 1 0 1 0 1 6 2 3 B C B C B C ¼ 4@ 2 A þ 2@ 1 A @ 0 A 4 3 2 0 1 0 1 0 1 0 1 24 4 3 17 B C B C B C B C ¼ @ 8 A þ @ 2 A @ 0 A ¼ @ 10 A: 16 6 2 20 A By (2.28) and (2.37), the columns of the product AB are linear combinations of the columns of A. The coefficients for the jth column of AB are the elements of the jth column of B. The product of a row vector and a matrix, a0 B, can be expressed as a linear com- bination of the rows of B, in which the coefficients are elements of a0 : 0 1 b01 B b02 C B C a0 B ¼ (a1 , a2 ,... , an )B. C ¼ a1 b01 þ a2 b02 þ þ an b0n : (2:38) @.. A b0n By (2.27) and (2.38), the rows of the matrix product AB are linear combinations of the rows of B. The coefficients for the ith row of AB are the elements of the ith row of A. 2.4 RANK 19 Finally, we note that if a matrix A is partitioned as A ¼ (A1 , A2 ), then A01 A0 ¼ (A1 , A2 )0 ¼ : (2:39) A02 2.4 RANK Before defining the rank of a matrix, we first introduce the notion of linear indepen- dence and dependence. A set of vectors a1 , a2 ,... , an is said to be linearly dependent if scalars c1 , c2 ,... , cn (not all zero) can be found such that c1 a1 þ c2 a2 þ þ cn an ¼ 0: (2:40) If no coefficients c1 , c2 ,... , cn can be found that satisfy (2.40), the set of vectors a1 , a2 ,... , an is said to be linearly independent. By (2.37) this can be restated as follows. The columns of A are linearly independent if Ac ¼ 0 implies c ¼ 0. (If a set of vectors includes 0, the set is linearly dependent.) If (2.40) holds, then at least one of the vectors a i can be expressed as a linear combination of the other vectors in the set. Among linearly independent vectors there is no redundancy of this type. The rank of any square or rectangular matrix A is defined as rank(A) ¼ number of linearly independent columns of A ¼ number of linearly independent rows of A: It can be shown that the number of linearly independent columns of any matrix is always equal to the number of linearly independent rows. If a matrix A has a single nonzero element, with all other elements equal to 0, then rank(A) ¼ 1. The vector 0 and the matrix O have rank 0. Suppose that a rectangular matrix A is n p of rank p, where p , n. (We typically shorten this statement to “A is n p of rank p , n.”) Then A has maximum possible rank and is said to be of full rank. In general, the maximum possible rank of an n p matrix A is min(n, p). Thus, in a rectangular matrix, the rows or columns (or both) are linearly dependent. We illustrate this in the following example. Example 2.4a. The rank of 1 2 3 A¼ 5 2 4 20 MATRIX ALGEBRA is 2 because the two rows are linearly independent (neither row is a multiple of the other). Hence, by the definition of rank, the number of linearly independent columns is also 2. Therefore, the columns are linearly dependent, and by (2.40) there exist constants c1 , c2 , and c3 such that 1 2 3 0 c1 þ c2 þ c3 ¼ : (2:41) 5 2 4 0 By (2.37), we can write (2.41) in the form 0 1 c 1 2 3 @ 1 A 0 c2 ¼ or Ac ¼ 0: (2:42) 5 2 4 0 c3 The solution to (2.42) is given by any multiple of c ¼ (14, 11, 12)0. In this case, the product Ac is equal to 0, even though A = O and c = 0. This is possible because of the linear dependence of the column vectors of A. A We can extend (2.42) to products of matrices. It is possible to find A = O and B = O such that AB ¼ O; (2:43) for example 1 2 2 6 0 0 ¼ : 2 4 1 3 0 0 We can also exploit the linear dependence of rows or columns of a matrix to create expressions such as AB ¼ CB, where A = C. Thus in a matrix equation, we cannot, in general, cancel a matrix from both sides of the equation. There are two exceptions to this rule: (1) if B is a full-rank square matrix, then AB ¼ CB implies A ¼ C; (2) the other special case occurs when the expression holds for all possible values of the matrix common to both sides of the equation; for example if Ax ¼ Bx for all possible values of x, (2:44) then A ¼ B. To see this, let x ¼ (1, 0,... , 0)0. Then, by (2.37) the first column of A equals the first column of B. Now let x ¼ (0, 1, 0,... , 0)0 , and the second column of A equals the second column of B. Continuing in this fashion, we obtain A ¼ B. 2.5 INVERSE 21 Example 2.4b. We illustrate the existence of matrices A, B, and C such that AB ¼ CB, where A = C. Let 0 1 1 2 1 3 2 2 1 1 A¼ , B ¼ @0 1 A, C¼ : 2 0 1 5 6 4 1 0 Then 3 5 AB ¼ CB ¼ : 1 4 A The following theorem gives a general case and two special cases for the rank of a product of two matrices. Theorem 2.4 (i) If the matrices A and B are conformal for multiplication, then rank(AB) rank(A) and rank(AB) rank(B). (ii) Multiplication by a full – rank square matrix does not change the rank; that is, if B and C are full– rank square matrices, rank(AB) ¼ rank(CA) ¼ rank(A). (iii) For any matrix A, rank(A0 A) ¼ rank(AA0 ) ¼ rank(A0 ) ¼ rank(A). PROOF (i) All the columns of AB are linear combinations of the columns of A (see a comment following Example 2.3). Consequently, the number of linearly independent columns of AB is less than or equal to the number of linearly independent columns of A, and rank(AB) rank(A). Similarly, all the rows of AB are linear combinations of the rows of B [see a comment follow- ing (2.38)], and therefore rank(AB) rank(B). (ii) This will be proved later. (iii) This will also be proved later. A 2.5 INVERSE A full-rank square matrix is said to be nonsingular. A nonsingular matrix A has a unique inverse, denoted by A1 , with the property that AA1 ¼ A1 A ¼ I: (2:45) 22 MATRIX ALGEBRA If A is square and less than full rank, then it does not have an inverse and is said to be singular. Note that full-rank rectangular matrices do not have inverses as in (2.45). From the definition in (2.45), it is clear that A is the inverse of A1 : (A1 )1 ¼ A: (2:46) Example 2.5. Let 4 7 A¼ : 2 6 Then :6 :7 A1 ¼ :2 :4 and 4 7 :6 :7 :6 :7 4 7 1 0 ¼ ¼ : 2 6 :2 :4 :2 :4 2 6 0 1 A We can now prove Theorem 2.4(ii). PROOF. If B is a full-rank square (nonsingular) matrix, there exists a matrix B1 such that BB1 ¼ I. Then, by Theorem 2.4(i), we have rank(A) ¼ rank(ABB1 ) rank(AB) rank(A): Thus both inequalities become equalities, and rank(A) ¼ rank(AB). Similarly, rank(A) ¼ rank(CA) for C nonsingular. A In applications, inverses are typically found by computer. Many calculators also compute inverses. Algorithms for hand calculation of inverses of small matrices can be found in texts on matrix algebra. If B is nonsingular and AB ¼ CB, then we can multiply on the right by B1 to obtain A ¼ C. (If B is singular or rectangular, we can’t cancel it from both sides of AB ¼ CB; see Example 2.4b and the paragraph preceding the example.) Similarly, if A is nonsingular, the system of equations Ax ¼ c has the unique solution x ¼ A1 c, (2:47) 2.5 INVERSE 23 since we can multiply on the left by A1 to obtain A1 Ax ¼ A1 c Ix ¼ A1 c: Two properties of inverses are given in the next two theorems. Theorem 2.5a. If A is nonsingular, then A0 is nonsingular and its inverse can be found as (A0 )1 ¼ (A1 )0 : (2:48) A Theorem 2.5b. If A and B are nonsingular matrices of the same size, then AB is nonsingular and (AB)1 ¼ B1 A1 : (2:49) A We now give the inverses of some special matrices. If A is symmetric and nonsin- gular and is partitioned as A11 A12 A¼ , A21 A22 and if B ¼ A22 A21 A1 1 11 A12 , then, provided A11 and B 1 exist, the inverse of A is given by A1 1 1 1 11 þ A11 A12 B A21 A11 A1 11 A12 B 1 A1 ¼ 1 1 : (2:50) B A21 A11 B1 As a special case of (2.50), consider the symmetric nonsingular matrix A11 a12 A¼ , a012 a22 in which A11 is square, a22 is a 1 1 matrix, and a12 is a vector. Then if A1 11 exists, A1 can be expressed as 1 bA1 1 0 1 11 þ A11 a12 a12 A11 A1 11 a12 , A1 ¼ 0 1 (2:51) b a12 A11 1 24 MATRIX ALGEBRA where b ¼ a22 a012 A1 11 a12. As another special case of (2.50), we have 1 A11 O A1 11 O ¼ : (2:52) O A22 O A1 22 If a square matrix of the form B þ cc0 is nonsingular, where c is a vector and B is a nonsingular matrix, then B1 cc0 B1 (B þ cc0 )1 ¼ B1 : (2:53) 1 þ c0 B1 c In more generality, if A, B, and A þ PBQ are nonsingular, then (A þ PBQ)1 ¼ A1 A1 PB(B þ BQA1 PB)1 BQA1 : (2:54) Both (2.53) and (2.54) can be easily verified (Problems 2.33 and 2.34). 2.6 POSITIVE DEFINITE MATRICES Quadratic forms were introduced in (2.33). For example, the quadratic form 3y21 þ y22 þ 2y23 þ 4y1 y2 þ 5y1 y3 6y2 y3 can be expressed as 3y21 þ y22 þ 2y23 þ 4y1 y2 þ 5y1 y3 6y2 y3 ¼ y0 Ay, where 0 1 0 1 y1 3 4 5 y ¼ @ y2 A, A ¼ @0 1 6 A: y3 0 0 2 However, the same quadratic form can also be expressed in terms of the symmetric matrix 0 5 1 3 2 1 2 (A þ A0 ) ¼ @ 2 1 3 A: 2 5 2 3 2 2.6 POSITIVE DEFINITE MATRICES 25 In general, any quadratic form y0 Ay can be expressed as A þ A0 y0 Ay ¼ y0 y, (2:55) 2 and thus the matrix of a quadratic form can always be chosen to be symmetric (and thereby unique). The sums of squares we will encounter in regression (Chapters 6 – 11) and analysis– of – variance (Chapters 12 – 15) can be expressed in the form y0 Ay, where y is an observation vector. Such quadratic forms remain positive (or at least nonne- gative) for all possible values of y. We now consider quadratic forms of this type. If the symmetric matrix A has the property y0 Ay. 0 for all possible y except y ¼ 0, then the quadratic form y0 Ay is said to be positive definite, and A is said to be a positive definite matrix. Similarly, if y0 Ay 0 for all y and there is at least one y = 0 such that y0 Ay ¼ 0, then y0 Ay and A are said to be positive semidefinite. Both types of matrices are illustrated in the following example. Example 2.6. To illustrate a positive definite matrix, consider 2 1 A¼ 1 3 and the associated quadratic form y0 Ay ¼ 2y21 2y1 y2 þ 3y22 ¼ 2( y1 12 y2 )2 þ 52 y22 , which is clearly positive as long as y1 and y2 are not both zero. To illustrate a positive semidefinite matrix, consider (2y1 y2 )2 þ (3y1 y3 )2 þ (3y2 2y3 )2 , which can be expressed as y0 Ay, with 0 1 13 2 3 A ¼ @ 2 10 6 A: 3 6 5 If 2y1 ¼ y2 , 3y1 ¼ y3 , and 3y2 ¼ 2y3 , then (2y1 y2 )2 þ (3y1 y3 )2 þ (3y2 2y3 )2 ¼ 0. Thus y0 Ay ¼ 0 for any multiple of y ¼ (1, 2, 3)0. Otherwise y0 Ay > 0 (except for y ¼ 0). A 26 MATRIX ALGEBRA In the matrices in Example 2.6, the diagonal elements are positive. For positive definite matrices, this is true in general. Theorem 2.6a (i) If A is positive definite, then all its diagonal elements aii are positive. (ii) If A is positive semidefinite, then all aii 0. PROOF (i) Let y0 ¼ (0,... , 0, 1, 0,... , 0) with a 1 in the ith position and 0’s elsewhere. Then y0 Ay ¼ aii. 0. (ii) Let y0 ¼ (0,... , 0, 1, 0,... , 0) with a 1 in the ith position and 0’s elsewhere. Then y0 Ay ¼ aii 0. A Some additional properties of positive definite and positive semidefinite matrices are given in the following theorems. Theorem 2.6b. Let P be a nonsingular matrix. (i) If A is positive definite, then P0AP is positive definite. (ii) If A is positive semidefinite, then P0 AP is positive semidefinite. PROOF (i) To show that y0 P0 APy. 0 for y = 0, note that y0 (P0 AP)y ¼ (Py)0 A(Py). Since A is positive definite, (Py)0 A(Py). 0 provided that Py = 0. By (2.47), Py ¼ 0 only if y ¼ 0, since P1 Py ¼ P1 0 ¼ 0. Thus y0 P0 APy. 0 if y = 0. (ii) See problem 2.36. A Corollary 1. Let A be a p p positive definite matrix and let B be a k p matrix of rank k p. Then BAB0 is positive definite. A Corollary 2. Let A be a p p positive definite matrix and let B be a k p matrix. If k. p or if rank(B) ¼ r, where r , k and r , p, then BAB0 is positive semidefinite. A Theorem 2.6c. A symmetric matrix A is positive definite if and only if there exists a nonsingular matrix P such that A ¼ P0 P. 2.6 POSITIVE DEFINITE MATRICES 27 PROOF. We prove the “if” part only. Suppose A ¼ P0 P for nonsingular P. Then y0 Ay ¼ y0 P0 Py ¼ (Py)0 (Py): This is a sum of squares [see (2.20)] and is positive unless Py ¼ 0. By (2.47), Py ¼ 0 only if y ¼ 0. A Corollary 1. A positive definite matrix is nonsingular. A One method of factoring a positive definite matrix A into a product P0 P as in Theorem 2.6c is provided by the Cholesky decomposition (Seber and Lee 2003, pp. 335– 337), by which A can be factored uniquely into A ¼ T0 T, where T is a non- singular upper triangular matrix. For any square or rectangular matrix B, the matrix B0 B is positive definite or posi- tive semidefinite. Theorem 2.6d. Let B be an n p matrix. (i) If rank(B) ¼ p, then B0 B is positive definite. (ii) If rank(B) , p, then B0 B is positive semidefinite. PROOF (i) To show that y0 B0 By. 0 for y = 0, we note that y0 B0 By ¼ (By)0 (By), which is a sum of squares and is thereby positive unless By ¼ 0. By (2.37), we can express By in the form By ¼ y1 b1 þ y2 b2 þ þ yp bp : This linear combination is not 0 (for any y = 0) because rank(B) ¼ p, and the columns of B are therefore linearly independent [see (2.40)]. (ii) If rank(B) , p, then we can find y = 0 such that By ¼ y1 b1 þ y2 b2 þ þ yp bp ¼ 0 since the columns of B are linearly dependent [see (2.40)]. Hence y0 B0 By 0. A 28 MATRIX ALGEBRA Note that if B is a square matrix, the matrix BB ¼ B2 is not necessarily positive semidefinite. For example, let 1 2 B¼ : 1 2 Then 2 1 2 0 2 4 B ¼ , BB¼ : 1 2 4 8 In this case, B 2 is not positive semidefinite, but B0 B is positive semidefinite, since y0 B0 By ¼ 2( y1 2y2 )2. Two additional properties of positive definite matrices are given in the following theorems. Theorem 2.6e. If A is positive definite, then A21 is positive definite. PROOF. By Theorem 2.6c, A ¼ P0 P, where P is nonsingular. By Theorems 2.5a and 2.5b, A1 ¼ (P0 P)1 ¼ P1 (P0 )1 ¼ P1 (P1 )0 , which is positive definite by Theorem 2.6c. A Theorem 2.6f. If A is positive definite and is partitioned in the form A11 A12 A¼ , A21 A22 where A11 and A22 are square, then A11 and A22 are positive definite. I PROOF. We can write A11, for example, as A11 ¼ (I, O)A , where I is the same O size as A11. Then by Corollary 1 to Theorem 2.6b, A11 is positive definite. A 2.7 SYSTEMS OF EQUATIONS The system of n (linear) equations in p unknowns a11 x1 þ a12 x2 þ þ a1p xp ¼ c1 a21 x1 þ a22 x2 þ þ a2p xp ¼ c2... an1 x1 þ an2 x2 þ þ anp xp ¼ cn (2:56) 2.7 SYSTEMS OF EQUATIONS 29 can be written in matrix form as Ax ¼ c, (2:57) where A is n p, x is p 1, and c is n 1. Note that if n = p, x and c are of differ- ent sizes. If n ¼ p and A is nonsingular, then by (2.47), there exists a unique solution vector x obtained as x ¼ A1 c. If n. p, so that A has more rows than columns, then Ax ¼ c typically has no solution. If n , p, so that A has fewer rows than columns, then Ax ¼ c typically has an infinite number of solutions. If the system of equations Ax ¼ c has one or more solution vectors, it is said to be consistent. If the system has no solution, it is said to be inconsistent. To illustrate the structure of a consistent system of equations Ax ¼ c, suppose that A is p p of rank r , p. Then the rows of A are linearly dependent, and there exists some b such that [see (2.38)] b0 A ¼ b1 a01 þ b2 a02 þ þ bp a0p ¼ 00 : Then we must also have b0 c ¼ b1 c1 þ b2 c2 þ þ bp cp ¼ 0, since multiplication of Ax ¼ c by b0 gives b0 Ax ¼ b0 c, or 00 x ¼ b0 c. Otherwise, if b0 c = 0, there is no x such that Ax ¼ c. Hence, in order for Ax ¼ c to be consistent, the same linear relationships, if any, that exist among the rows of A must exist among the elements (rows) of c. This is formalized by comparing the rank of A with the rank of the aug- mented matrix (A, c). The notation (A, c) indicates that c has been appended to A as an additional column. Theorem 2.7 The system of equations Ax ¼ c has at least one solution vector x if and only if rank(A) ¼ rank(A, c). PROOF. Suppose that rank(A) ¼ rank(A, c), so that appending c does not change the rank. Then c is a linear combination of the columns of A; that is, there exists some x such that x1 a1 þ x2 a2 þ þ xp ap ¼ c, which, by (2.37), can be written as Ax ¼ c: Thus x is a solution. Conversely, suppose that there exists a solution vector x such that Ax ¼ c. In general, rank (A) rank(A, c) (Harville 1997, p. 41). But since there exists an x such that Ax ¼ c, we have rank(A, c) ¼ rank(A, Ax) ¼ rank[A(I, x)] rank(A) [by Theorem 2:4(i)]: 30 MATRIX ALGEBRA Hence rank(A) rank(A, c) rank(A), and we have rank(A) ¼ rank(A, c). A A consistent system of equations can be solved by the usual methods given in elementary algebra courses for eliminating variables, such as adding a multiple of one equation to another or solving for a variable and substituting into another equation. In the process, one or more variables may end up as arbitrary constants, thus generating an infinite number of solutions. A method of solution involving gen- eralized inverses is given in Section 2.8.2. Some illustrations of systems of equations and their solutions are given in the following examples. Example 2.7a. Consider the system of equations x1 þ 2x2 ¼ 4 x1 x2 ¼ 1 x1 þ x2 ¼ 3 or 0 1 0 1 1 2 4 @1 x 1 A 1 ¼ @ 1 A: x2 1 1 3 The augmented matrix is 0 1 1 2 4 (A, c) ¼ @ 1 1 1 A, 1 1 3 which has rank ¼ 2 because the third column is equal to twice the first column plus the second: 0 1 0 1 0 1 1 2 4 2@ 1 A þ @ 1 A ¼ @ 1 A: 1 1 3 Since rank(A) ¼ rank(A, c) ¼ 2, there is at least one solution. If we add twice the first equation to the second, the result is a multiple of the third equation. Thus the third equation is redundant, and the first two can readily be solved to obtain the unique solution x ¼ (2, 1)0. 2.7 SYSTEMS OF EQUATIONS 31 Figure 2.1 Three lines representing the three equations in Example 2.7a. The three lines representing the three equations are plotted in Figure 2.1. Notice that the three lines intersect at the point (2, 1), which is the unique solution of the three equations. A Example 2.7b. If we change the 3 to 2 in the third equation in Example 2.7, the aug- mented matrix becomes 0 1 1 2 4 (A, c) ¼ @ 1 1 1 A,

Linear Models in Statistics (2nd Edition) PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue