Data Exploration and PCA Concepts

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is a primary reason for conducting Principal Component Analysis (PCA)?

To ensure that all features have equal importance in the analysis.
To transform data into a more easily interpretable format.
To maximize variance while reducing the dimensionality of data. (correct)
To minimize computational complexity during data sampling.

Which statement accurately describes the relationship between maximizing variance and minimizing reconstruction error in PCA?

Maximizing variance leads to an increase in reconstruction error.
They are equivalent objectives that are achieved simultaneously in PCA. (correct)
The two objectives are independent and do not influence each other.
Minimizing reconstruction error provides no benefit to variance maximization.

What is the role of the covariance matrix in PCA?

To quantify the spread and relationship of data dimensions. (correct)
To normalize the data prior to dimensionality reduction.
To ensure all projected dimensions have equal variance.
To calculate the mean of projected components.

When using PCA, how is the weight vector 'w' selected?

To maximize the variance in the projected data while maintaining unit length. (C) Signup and view all the answers

What is the primary limitation of K-means clustering that may affect the choice of the number of clusters?

It is sensitive to initializations and may converge to local minima. (A) Signup and view all the answers

Which technique could be considered an alternative to K-means clustering?

Hierarchical clustering. (B) Signup and view all the answers

In PCA, if the eigenvector with the largest eigenvalue is chosen, what does this vector represent?

The principal component capturing the maximum variance. (C) Signup and view all the answers

Why is it important to center all features before conducting PCA?

To create an unbiased estimate of the covariance matrix. (A) Signup and view all the answers

What is the preferred NumPy function for computing eigenvalues and eigenvectors of a symmetric matrix due to its numerical stability?

numpy.linalg.eigh() (B) Signup and view all the answers

In the context of N-dimensional data, how many principal components (PCs) are there available to capture variance?

N (C) Signup and view all the answers

Why is it important to center the data when performing PCA?

To make the covariance matrix symmetrical (D) Signup and view all the answers

What will the covariance matrix become when expressed in the eigenvector basis?

It becomes diagonal (B) Signup and view all the answers

What is a common rule of thumb for selecting the number of principal components in PCA?

Look for an 'elbow' in the variance explained plot (C) Signup and view all the answers

What is a significant drawback of the K-means clustering algorithm?

Its results depend on the initial placement of centroids. (A) Signup and view all the answers

What is a key characteristic of the covariance matrix for three dimensions, specifically regarding its diagonal?

It contains the variances of each dimension (D) Signup and view all the answers

Which method is commonly used to determine the optimal number of clusters (K) in K-means clustering?

Elbow Method (D) Signup and view all the answers

What must be considered when standardizing features in PCA?

Features must be centered around zero (C) Signup and view all the answers

If the eigenvalues of a covariance matrix are $S^2 / N$, what mathematical decomposition does this represent?

Eigenvalue Decomposition (A) Signup and view all the answers

Which of the following clustering algorithms can effectively handle non-spherical cluster shapes?

DBSCAN (A) Signup and view all the answers

What is one of the primary techniques of dimensionality reduction?

Principal Component Analysis (PCA) (D) Signup and view all the answers

What effect does choosing a very small epsilon value have when using DBSCAN?

It leads to many small clusters, making the clustering unreliable. (A) Signup and view all the answers

Which statement is true regarding the characteristics of K-means clustering?

It struggles with complex shapes and requires feature scaling. (A) Signup and view all the answers

What is a primary focus of dimensionality reduction techniques?

To simplify datasets by reducing the number of features. (B) Signup and view all the answers

Which clustering algorithm creates a hierarchy tree to demonstrate relationships within a dataset?

Hierarchical Clustering (B) Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes