Pca Analysis Pdf
Document Details
Uploaded by Deleted User
ISCTE - IUL
Jose G. Dias
Tags
Summary
This document is a lecture on principal component analysis (PCA). It covers the theory, applications, intuition, and limitations of PCA. The document also details how PCA can be used for data compression and dimensionality reduction.
Full Transcript
Reconhecimento de Padrões José G. Dias, ISCTE-IUL [email protected] Reconhecimento de Padrões Lecture 2. Principal component analysis 2.2. Theory and practice of PCA Motivation Theory of PCA Steps in PCA...
Reconhecimento de Padrões José G. Dias, ISCTE-IUL [email protected] Reconhecimento de Padrões Lecture 2. Principal component analysis 2.2. Theory and practice of PCA Motivation Theory of PCA Steps in PCA PCA & Extensions 1 Motivation Reconhecimento de Padrões José G. Dias, ISCTE-IUL 3 Principal component analysis: Motivation “Big data” is often more focused on how to handle big (number of rows) Often, there are issues with big (number of variables) Real data may have thousands or millions of dimensions (e.g., web documents, where the dimensionality is the vocabulary of words; Facebook graph, where the dimensionality is the number of users; image processing; micro array experiments) Reconhecimento de Padrões José G. Dias, ISCTE-IUL 4 2 Principal component analysis: Motivation Huge number of dimensions causes problems (curse of dimensionality): Data becomes very sparse, some algorithms become meaningless (e.g., density-based clustering) Variables are often correlated, which causes some to be redundant at some point (e.g., multicollinearity) The complexity of several algorithms depends on the dimensionality, which makes them infeasible (too slow) Reconhecimento de Padrões José G. Dias, ISCTE-IUL 5 Principal component analysis: Motivation We want to scale down in order to analyze a smaller, uncorrelated subset of data, i.e., transform the variables into a smaller data set with derived (synthetic, composite) variables: Then, we can use this smaller data set in other algorithms like linear regression, clustering, etc. Reconhecimento de Padrões José G. Dias, ISCTE-IUL 6 3 Principal component analysis: Motivation Discover and summarize pattern of intercorrelations between variables Useful to deal with large data-sets X2 (many variables) and summarize information The idea behind these methods is to X1 avoid double counting of the same information by distinguishing between the individual information content of each variable Reconhecimento de Padrões José G. Dias, ISCTE-IUL 7 Principal component analysis: Intuition X2 X1 Reconhecimento de Padrões José G. Dias, ISCTE-IUL 8 4 Principal component analysis: Intuition PC1 – First principal component X2 X1 Reconhecimento de Padrões José G. Dias, ISCTE-IUL 9 Principal component analysis: Intuition PC1 X2 PC2 - Second principal component X1 Reconhecimento de Padrões José G. Dias, ISCTE-IUL 10 5 Principal component analysis: Basics Principal component analysis summarizes data by finding the major correlations in linear combinations of the observations: Little information lost in the process (usually) Major application: Correlated variables are transformed into uncorrelated variables Probably the most widely-used and well-known of the “standard” multivariate methods It was invented by Pearson (1901) and Hotelling (1933). Also known as the Karhuen-Loève transformation Reconhecimento de Padrões José G. Dias, ISCTE-IUL 11 Principal component analysis: Basics Principal component analysis: creates new variables that are linear functions of the original variables reduces the number of original variables while retaining as much as possible of the variation present in the dataset removes redundant (highly correlated) variables in the dataset, i.e., data compression and noise removal Dimensionality reduction implies information loss !! Reconhecimento de Padrões José G. Dias, ISCTE-IUL 12 6 Principal component analysis: Applications Structure detection/reduction: Discover and summarize pattern of intercorrelations between variables Data filtering the noise and revealing the hidden structure Technique for dealing with multicollinearity It facilitates the interpretation of a large number of variables Index / scale development are derived from the measurement of other, directly observable variables Defining indicators of ‘constructs’ and the evaluation of the quality of a measure The assessment of the dimensionality of a set of variables Component scores can be then used as new variable There are many other uses of PCA Reconhecimento de Padrões José G. Dias, ISCTE-IUL 13 Theory of PCA Reconhecimento de Padrões José G. Dias, ISCTE-IUL 14 7 Principal component analysis: Theory Principal components are a new coordinate system Given data on variables, the hope is that the data points will lie mainly in a linear subspace of dimension lower than In practice, the data will usually not lie exactly in some lower- dimensional subspace, but we may be able to approximate with a reduced subspace of dimension ≪ , which retains most of the information/variability in the data There will be new new variables defining the subspace Reconhecimento de Padrões José G. Dias, ISCTE-IUL 15 Principal component analysis: Theory The new variables, which form a new coordinate system are called principal components (PCs) and will be denoted by , …, They are orthogonal linear transformations of the original variables, so there are at most of them Specifically, let the PC be a linear combination of ,…, [centered] defined by coefficients or loadings = ⋯ , i.e. = + ⋯+ = Reconhecimento de Padrões José G. Dias, ISCTE-IUL 16 8 Principal component analysis: Theory The variance of this component is = + ⋯+ = = where is the sample covariance matrix We choose the first PC, to have maximum variance (so capturing as much of the variability in ,…, as possible) Subsequent PC will take up successively smaller parts of the total variability Reconhecimento de Padrões José G. Dias, ISCTE-IUL 17 Principal component analysis: Theory Thus, our goal is to find the linear combination, , that maximizes the variation of , i.e. Maximize = where is the sample covariance matrix Since is arbitrary and can increase by scaling, we normalize Maximize given by = Reconhecimento de Padrões José G. Dias, ISCTE-IUL 18 9 Principal component analysis: Theory Maximize given by = We have = - =0 - =0 − =0 − =0 Thus, is an eigenvector with eigenvalue Reconhecimento de Padrões José G. Dias, ISCTE-IUL 19 Principal component analysis: Theory ( ) is maximized if is the max eigenvalue of , and first PC is the corresponding eigenvector All the PCs are generated this way. Each is an eigenvector of and theirs corresponding eigenvalue satisfies ≥ ≥⋯≥ Thus ( )≥ ( ) ≥⋯≥ ( ) Reconhecimento de Padrões José G. Dias, ISCTE-IUL 20 10 Principal component analysis: Theory We choose the first PC, to X2 have maximum variance (so PC1 capturing as much of the PC2 variability in ,…, as X1 possible Subsequent PCs will take up successively smaller parts of the total variability Reconhecimento de Padrões José G. Dias, ISCTE-IUL 21 Principal component analysis: Theory Characteristics of PCs: They decompose the total variance in the data ( )= = ( )= ( ) They are orthogonal linear transformations of the original variables, so there are at most of them The goal is to reduce the dimension, i.e., that we will need only PCs ( ≪ ), ,⋯, to approximate the space spanned by the values of ,⋯, Reconhecimento de Padrões José G. Dias, ISCTE-IUL 22 11 Principal component analysis: Theory From original variables ( 1, 2, … , ), PCA produces new variables: 1, 2, … ,. For observation 's are = + + ⋯+ Principal Components, i.e., = + + ⋯+ linear combinations of observed variables that are … independent (orthogonal) of = + + ⋯+ other components In matrix notation, ⋯ ⋮ = ⋯ ⋮ ⋱ ⋮ ⋯ Component = Component scores loadings Reconhecimento de Padrões José G. Dias, ISCTE-IUL 23 Covariance matrix The covariance matrix is a real symmetric positive semi-definite matrix. Then, All eigenvalues must be real Eigenvectors corresponding to different eigenvalues are orthogonal All eigenvalues are greater than or equal to zero Care about missing data (and pairwise covariances/ correlations) Reconhecimento de Padrões José G. Dias, ISCTE-IUL 24 12 Covariance matrix The input to the analysis can be either the covariance ( ) or the correlation matrix ( ) Using the covariance matrix, variables with larger variance will dominate the PCA results as results depend on the units used to measure the original variables and range of values they assume We should use the correlation matrix unless the variables have been standardized. In this case: Variables are standardized (mean 0, SD 1) Original variables can be in different units All variables have the same impact on analysis Reconhecimento de Padrões José G. Dias, ISCTE-IUL 25 Component loadings Component loadings ( ) are the correlations between the variables (rows) and components (columns) The squared component loading is the percent of variance in that variable that is explained by the component Somehow it should be higher than 0.3; in that case, it accounts for 0.09 of variance in the component (i.e., 0.3 ) Reconhecimento de Padrões José G. Dias, ISCTE-IUL 26 13 Communalities From the constraints on the unit norm of the eigenvectors ( = 1): =1 where are the coordinates of the eigenvectors or loadings The communality for the variable is the sum of the squared loadings for that variable on the retained components ℎ = Reconhecimento de Padrões José G. Dias, ISCTE-IUL 27 Communalities Communality represents the amount of variance of the original variable that is summarized by the retained components, while 1 − ℎ is the amount of variance of discarded by selecting only the first components. Does the solution represent well that variable? For each variable, it is the sum of the squared loadings across components (“ ” of each variable) Reconhecimento de Padrões José G. Dias, ISCTE-IUL 28 14 Steps in PCA Reconhecimento de Padrões José G. Dias, ISCTE-IUL 29 Steps in principal component analysis PCA usually proceeds in these steps: 1. Check PCA adequacy 2. PC extraction & number of PCs 3. PC rotation & interpretation 4. Make final decisions on the number of underlying PCs 5. Create scores Reconhecimento de Padrões José G. Dias, ISCTE-IUL 30 15 Step 1. PCA adequacy – data set Sample size (varies in the literature, from a minimum of 30 to a minimum of 300); if the sample size is small then the factors loadings should be high to include the component) Number of variables Ratio of sample size to number of variables (different ratios given in literature, from 5:1 to 30:1) Reconhecimento de Padrões José G. Dias, ISCTE-IUL 31 Step 1. PCA adequacy – Correlation matrix PCA is not scale invariant! (Use standardized variables) Compute the correlation matrix for all variables (correlation / covariance matrix is the PCA input) When the links between the original variables are weak, it is not possible to perform a significant data reduction without losing a considerable amount of information; i.e., if the correlation between variables is small, it is unlikely that they share common PCs Correlation coefficients greater than 0.3 in absolute value are indicative of acceptable correlations Reconhecimento de Padrões José G. Dias, ISCTE-IUL 32 16 Step 1. PCA adequacy – Bartett’s test of sphericity Bartlett's test of sphericity can be used to test the null hypothesis that the variables are uncorrelated in the population; in other words, the population correlation matrix is an identity matrix (all diagonal terms are 1 and all off-diagonal terms are = 0). Assume 0.05 as the significance level, when testing : = If the associated p-value is small (