Principal Component Analysis (PCA) Overview

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is the minimum recommended sample size for conducting PCA according to various literature?

300 (correct)
30 (correct)
150
10

Which of the following statements about PCA adequacy is true?

PCA can be conducted on any set of variables, regardless of correlations.
Using unstandardized variables is recommended for better PCA results.
Strong correlations among variables improve the feasibility of data reduction. (correct)
A ratio of 1:1 is sufficient between sample size and number of variables.

What is indicated by correlation coefficients greater than 0.3 in the context of PCA?

Poor data reliability
Significant uncorrelation
High multicollinearity
Acceptable correlations (correct)

What does Bartlett's test of sphericity assess in the context of PCA?

If the variables are uncorrelated in the population (C) Signup and view all the answers

Which condition is necessary for PCA to be effective regarding variable correlations?

High correlations among the original variables (D) Signup and view all the answers

What are the new variables formed in principal component analysis called?

Principal components (PCs) (A) Signup and view all the answers

How many principal components can be produced from a given set of original variables?

At most the same number as the original variables (D) Signup and view all the answers

What is the goal of principal component analysis?

To maximize variation among newly created components (D) Signup and view all the answers

What does the first principal component capture in principal component analysis?

The maximum variability of the data (B) Signup and view all the answers

What is used to express the principal component as a linear combination?

Coefficients or loadings (C) Signup and view all the answers

What must be done to the linear combination before maximizing variation in principal component analysis?

Normalize the variables (C) Signup and view all the answers

In principal component analysis, an eigenvector represents what component?

The linear transformation of original variables (B) Signup and view all the answers

What happens to subsequent principal components after the first?

They capture successively smaller parts of total variability (A) Signup and view all the answers

What is the primary purpose of principal component analysis (PCA)?

To find the major correlations in data using linear combinations (B) Signup and view all the answers

Which statement accurately describes the outcome of PCA?

It transforms correlated variables into uncorrelated variables. (B) Signup and view all the answers

Who were the inventors of principal component analysis?

Pearson and Hotelling (B) Signup and view all the answers

Why is PCA particularly useful for large datasets?

It summarizes information without losing much of it. (B) Signup and view all the answers

What does the first principal component (PC1) represent in PCA?

The combination of variables that accounts for the largest variance. (C) Signup and view all the answers

What is a major advantage of using principal component analysis?

It helps to clarify complex relationships among variables. (C) Signup and view all the answers

What does PCA aim to achieve in terms of data representation?

To create linear combinations that simplify the dataset. (C) Signup and view all the answers

What transformation is PCA also known as?

Karhuen-Loève transformation (C) Signup and view all the answers

What is the objective of the first principal component (PC1) in PCA?

To maximize the variance captured from the original data. (C) Signup and view all the answers

How are the subsequent principal components determined in relation to the first?

They take up successively smaller parts of the total variability. (D) Signup and view all the answers

What characteristic of principal components is emphasized in PCA?

They are orthogonal linear transformations of the original variables. (A) Signup and view all the answers

How does PCA relate to the total variance of the original dataset?

PCA decomposes the total variance into its principal components. (A) Signup and view all the answers

What is the maximum number of principal components that can be produced from n original variables?

n (C) Signup and view all the answers

What is the purpose of reducing dimensionality in PCA?

To simplify the data while retaining its variability. (C) Signup and view all the answers

Which statement accurately describes the eigenvalues produced in PCA?

Eigenvalues indicate the amount of variance explained by their corresponding PCs. (C) Signup and view all the answers

Which of the following best describes the relationship among the principal components?

They are orthogonal to each other. (A) Signup and view all the answers

What do component loadings represent in PCA?

The correlations between variables and components (D) Signup and view all the answers

What is indicated by a squared component loading higher than 0.3?

It accounts for at least 9% of variance in the component (A) Signup and view all the answers

How is communality defined in PCA?

The sum of the squared loadings for a variable on retained components (A) Signup and view all the answers

What does a high communality value imply about a variable in PCA?

The variable accounts for a significant amount of variance in the retained components (A) Signup and view all the answers

Which of the following is the first step in principal component analysis?

Check PCA adequacy (D) Signup and view all the answers

What does the term 'eigenvectors' refer to in PCA?

The coordinates of the components associated with the original variables (D) Signup and view all the answers

What does $1 - h$ represent in the context of communalities?

The amount of variance discarded from a variable (B) Signup and view all the answers

In PCA, when is the step of 'PC rotation & interpretation' performed?

After extracting the principal components (B) Signup and view all the answers

What does the covariance matrix indicate about its eigenvalues?

All eigenvalues must be real. (A) Signup and view all the answers

Under what condition should the correlation matrix be used instead of the covariance matrix in PCA?

When variables are standardized with a mean of 0 and SD of 1. (C) Signup and view all the answers

What is a key characteristic of the covariance matrix?

It is a real symmetric positive definite matrix. (C) Signup and view all the answers

Why should caution be taken regarding missing data in covariance matrices?

It prevents correct calculation of pairwise correlations. (D) Signup and view all the answers

What happens when using the covariance matrix for PCA without standardization?

Variables with larger variance will influence the PCA outcomes more. (D) Signup and view all the answers

What is the effect of the eigenvectors associated with different eigenvalues of a covariance matrix?

They are orthogonal to one another. (D) Signup and view all the answers

What must be true about the eigenvalues of a covariance matrix?

They must all be greater than or equal to zero. (D) Signup and view all the answers

What is indicated by principal components in PCA?

They represent linear combinations of observed variables that are independent. (C) Signup and view all the answers

Flashcards

Principal Component Analysis (PCA)

A statistical technique used to find patterns and correlations between multiple variables within a dataset. It aims to simplify complex datasets by discovering the major trends and relationships.

Bartlett's Test of Sphericity

A test that checks if the variables in a dataset are uncorrelated. It assumes the population correlation matrix is an identity matrix, meaning variables are independent.

Why is PCA important for large datasets?

PCA is particularly useful when dealing with datasets containing a large number of variables. It reduces the complexity of the data by extracting the most important information.

Correlation Coefficient

A measure of how well the variables in a dataset are related. A higher correlation coefficient indicates a stronger relationship between variables.