Unit 16 Introduction To Multivariate Analysis PDF
Document Details
Uploaded by TriumphalParticle9439
Tags
Summary
This document provides an introduction to multivariate analysis. It discusses various techniques like dealing with one data set, and predicting nominal variables. It covers topics such as discriminant analysis and confirmatory factor analysis.
Full Transcript
UNIT 16 INTRODUCTION TO- MULTIVARIATE ANALYSIS Structure 16.0 Objectives 16. I Introduction 16.2 Dealing with One Data Set ' 16.3 Dealing with Two Data Sets: One Dependent and One Independent...
UNIT 16 INTRODUCTION TO- MULTIVARIATE ANALYSIS Structure 16.0 Objectives 16. I Introduction 16.2 Dealing with One Data Set ' 16.3 Dealing with Two Data Sets: One Dependent and One Independent 16.4 Predicting a Nominal Variable: Discriminant Analysis 16.5 Fitting a Model: Confirmatory Factor Analysis 16.6 Dealing with Two Data Sets: Two Dependent Variables Sets 16.7 Let Us Sum Up 16.8 Key Words 16.9..- Soine Useful BooksIReferences 16.10 AnswersIHints to Check Your Progress Exercises 16.0 OBJECTIVES After going through this unit, you will be able to: ) explain the concept of multivariate analysis; apply the specific techniques used under multivariate analysis; decide on the technique to be used in a research problem; and describe the statistical issues involved in multivariate data analyais.. 16.1 INTRODUCTION 4 Multivariate analysis involves a set of techniques to analyse data sets on more than one variable. Many of these techniques are modern and often involve quite sophisticated use of computing tools. such analyses refer to all statistical methods that simultaneously analyse multiple measurements on each individual or object 1 under investigation. Hence, any analysis simultaneously involving analysis of more than or equalto two variables can loosely be considered mu~tiva~iate analysis. This unit will provide a list of such analyses i l l order to help decide when to use a given statistical technique for a given type of data or statistical question. It also gives a brief description of each technique. It is organized according to the number of data sets to analyze: one or two (or more). With two data sets we consider two cases: in I t the first case, one set of data plays the role of predictors or independent.variablesand the second set of data corresponds to measurements or dependent variables; in the second case. the different sets of data correspond to different sets of dependent variables. Let us begin with analysis of sibations involving a single dataset. , 1 16.2 DEALING WITH ONE DATA SET In case of one data set, the data tables to be anaIyzed are made of several measurements collected on a set of units (e.g., subjects). This implies that the -. I. , Multiv%WitbAnalysis investigator is not really intererted ili ti'.? ,:. :, :t:e tiei;e:rac;;:' : : ,. 4. : ~ : ~ r li;!::;:.:.n;.:;.s,~ variable and the predictors c!r iir;fi:;..: :' :.,,.... -.; :(:; , ~... ~.c.,+,L~ , ; in cither , creating groups or clusters ol' relateit viiri:;:it. i :;i :biii.,c:"vat:onsr-cintcd (6 2 si~;~le' variable. Interval or Ratio Level of iMeaosaar~~ ",: ,: :,'. b:uclpaPC ' ~ - ~ ~ n p o nnaaiysici eni 'when faced with a large number of va?!r:!,~::.:.;7 ,;-:,-.i;,,::l cori~j~oi-~~:.:;~ al?aig,sis{PC[% i s..., " a helpful measure to reduce the numhc:~G I ' vz-::L(:;es. isCA iic::i;nlposes the cntire. , ,,.. data with correlated variable? intc : I. -.....,,....? +,.;.;! rJ. 2 ldihiz:?. " , ? ,:,st. ,, ; i-:cs< ~ ! : n ; ? ~a~ series t e a o~.ordered orthogonal lirrear combinations of the deper:dcn: \ar:ui)ic, (i.e., factors) with the con'straint that the first factor generates the largest 11 if ~isedin an ANOVA. The sampling distribution ofthis F is adjusted to take i ~ t oaccount its construction. - - 16.4 PREDICTING A NOMINAL VARIABLE: DISCRIMINANT ANAI,YSHS Discriminant analysis (DA) helps to determine which variables discriminate between two or more naturally occurring groups. Mathematically equivalent to MANOVA, it ' is extensively used when a set of explanatory variables are used to predict the group to which a given unit belongs (which is a nominal dependent variables). It cornhines the explanatory variables in order to create the largest F when the groups are used as t ,-- a fixed factor in an ANOVA. The model is constructed with a set of observations for which the classes are known. The set of observations are sometimes referred to as the training set. Based on the training set, the technique constructs a set of linear functions of the predictors, known as discriminant functions, such that where the b's are discriminant coefficients, the x's are the input variables or. predictors and C is a constant. - L- For example, an educational researcher may want to investigate which variables discriminate between high school graduates who decide (a) to go to college, (b) to attend a trade or professional school, or (c) to seek no further training or education. For that purpose the researcher could collect data on numerous variables prior to students' graduation. After graduation, most students will naturally fall into one s f the three categories. Discriminant analysis could then be used to determine which : variable(s) are the best predictors of students' subsequent educational choice. - 16.5 FITTING A MODEL: CONFIRMATORY ,s '. -.- L. FACTOR ANALYSIS _ ,. Confirmatory factor analysis (CFA) seeks to determine whether the number of factors and the loadings of measured (indicator) variables on them conform to what is expected on the basis of pre-established theory. Indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. The researcher first generates one (or a few) model(s) of an underlying explanatory structure (i.e., a construct) which is often expressed as a graph. The researcher's ri priori assumption is that each factor (the number and labels of which may be specified hpriori) is associated with a specified subset of indicator variibles. A minimum requirement of confirmatory factor analysis is that one IiypotheSize beforehand the number of faCtors in the model, but usually also the researcher will posit expectations about which variables will load on which factors (Kim and Mueller, 1978b: 55). The researcher seeks to determine, for instance, if measures created to represent a latent variable really belong together. The correlations between the dependent variables are fitted to this structure. Models are evaluated by comparing how well they fit the data. Variations over CFA are called structural equation modelling (SEM), LISREL, or EQS. ~ntrodurtio&~ 86.6 DEALING WITH TWO DATA SETS: TWO Multi$artate Ana ?s DEPENDENT VARIABLES SETS Canonical Correlation Analysis a+ Canonical correlation analysis (CC) allows the investigation of the relationship between two ,sets of variables. For example, a sociologist may want to investigate the ~xlationshipbetween two predictors of social mobility based on interviews, with actual subsequent social nlobility as measured by four different indicators. A medical researcher may want to study the relationship of various risk factors to the development of a group of symptoms. In all of these cases, the researcher is interested in the relationship betweet; ' two, sets of variables, and Canonical l d the appropriate method of analysis.. Correlation w o ~ ~ be Canonical Correlation combines the dependent variables to find pairs, of new variables called canonical variables, CV, 'one for each data table having the highest. correlation. However, the CV's, even when highly correlated, do not necessarily explain a large portion of the variance of the original tables. This makes the interpretation of the CV sometimes difficult, but CC is nonetheless an important theoretical tool because most multivariate techniques can be interpreted as a special case of CC. Multiple F a c t o r Analysis Multiple factor analysis (MFA) combines several data tables into one single analysis. The first step is to perlorin a PCA of each table. Then each data table is normalized by dividing all the entries of the table by the first eigenvalue of its PCA. This transformation - akin to the univariate z-score of the normal distribution - equalizes. the weight of each table in the final solution and therefore makes possible the siinultaneous analysis of several heterogcneous data tables. Multiple Correspondence Analysis - Correspondence analysis is an exploratory technique used to analyze simple two-way and multi-way tables containing measures of correspondence between the rows and colulnns of any given data. The results provide information almost similar to those produced by Factor Analysis techniques, and they allow us to explore the structure of categorical variables included in the table. Multiple correspondence analysis (MCA) is an extension of simple correspondence analysis to more than two variables. MCA can be used to analyze several contingency tables by generalizing CA. PARAFAC and TUCKER3 Both these techniques are used for three-way data analysis. PARAFAC model is the simplest three-way model. These techniques handle three-way data matrices by generalizing the PCA decomposition into scores and lo:~diilgsin order to generate tfiree matrices of loading (one for each dimension of the data). They differ by the constraints they impose on the decomposition (TUCKER3 generates orthogonal loadings, PARAFAC does not). Indscal lndscal is used when each of several subjects generates a data matrix with the same units and the same variables for all the subjects. lndscal generates a common Euclidean solution iwith dimensions) and expresses the differences between subjects as differences in the importancegiven to the common dimensions... Multivariate Analysis Statis is used when at least one dimension of'the three-ha) table is conilnon to all tables (e.g., same units measured on several occasions witli different variables). The' first step of the method performs a PCA of each table and generates n siniilarity table (i.e., cross-product) between the units for each table. 1 'The similarity tables are then combined by cornputink a cross-product matrix arid performing its PCA (without centering). 'The load~i~gs on the first component of this analysis are then used as weights to compute the compromise data table which is the weighted average of all the tables. The original table (and their unitf) are projected into the compromise space in order to explore their con~munalitiesand differences. Procustean Analysis Procustean analysis (PA) is used to compare distance tables obtained on the same objects. The first step is to represent the tables by MDS maps. Then procustean analysis finds a set of transformations that will make thc position of the ob-jects in both maps as close as possible (in the leas1 squares sense). Check'Your Progress 1 1) Explain he purpose of carrying out a discriminant analysis................................................................................................................................................................................................ 2) Explain the fotlowing'concepts: a) Canonical Correlation Analysis b) Multiple Factor Analysis c) MANOVA 16.7 LET US SUM UP In this Unit we explained some of the technique? that can be used in arialysis of multivariate data. There could be two situations where multivariate analysis is undertaken depending upon whether we have one data set or more than one data sets. There are several techniques available to researchers in each category. We have discussed the underlying ideas in each of these techniques i'n brief. This will servc as a prelude to the following two Units in the Block.- Introduction to 16.8 KEY WORDS hlultivariat,e Analysis Ridge Regression : Ridge Regression accommodates the multicollinearity problem by adding a small constant (the ridge) to the diagonal o f. the correlation matrix. This makes the computation of the regression estimates possible.. Confirmatory : It seeks to determine whether the number of factors and factor analysis the loadings of measured (indicator) variables on them confonn to what is expected on the basis of pte- established theory. Indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. Multiple factor : It combines several data tables into one single analysis. analysis The first step is to perform a PCA of each table. Then each data table is normalized by dividing all the entries of the table by the first eigewalue of its PCA. 16.9 SOME USEFUL BOOKS1 REFERENCES' Borg I., and Groenen P., 1 997, Modern Mulfidimensional Scaling, Springer-Verlag,. New York. Johnson R.A., & Wichern D.W., 2002, Applied Multivariate Statisfical Analysis. Prentice-Hall, Upper Saddle River (NJ). Kim, Jae-On and Charles W. Muel ler, 1978, Introducfion'to FacforAnalysis: What it is und how to do it, Quantitative Applications in the Social Sciences.Series, No. 13. Sage Publications. Thousand Oaks, CA. Naes T., and Risvik E. (Eds.), 1996, Multivariate Analysis of Data in Sensory Science. Elsevier, New York. , ' Weller S.C., and Romney A.K., 1990, Metric Scaling: Correspoi?dence Analysis. Thousand Oaks, Sage Publications, CA. I 16.10 ANSWERSIHINTS TO CHECK YOUR PROGRESS EXERCISES Check Your Progress 1 1) See Section 16. 2 and Section 16.4 and answer. 2) See Section 16.5 and answer.