Machine Learning - Principal Components Analysis (PCA)

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary goal of Principal Component Analysis?

  • To increase the number of variables in a dataset
  • To reduce the number of variables while preserving information (correct)
  • To retain all the original variables in analysis
  • To create a supervised learning model

Which of the following best describes PCA's role in machine learning?

  • It specializes in handling uncorrelated features.
  • It directly increases accuracy by generating more data.
  • It helps in dimensionality reduction and feature selection. (correct)
  • It is used for classification of labeled data.

What does a covariance value of 0 indicate in PCA?

  • Features have an inverse relationship.
  • Features are independent of each other. (correct)
  • There is a positive correlation between the features.
  • Features are entirely dependent on each other.

Which of the following is NOT a step involved in PCA?

<p>Calculating the loss function (B)</p> Signup and view all the answers

How does PCA help in addressing the curse of dimensionality?

<p>By reducing the dimensions of the data (C)</p> Signup and view all the answers

What type of learning category does PCA belong to?

<p>Unsupervised Learning (A)</p> Signup and view all the answers

Which of the following describes the eigenvalues and eigenvectors in PCA?

<p>They indicate the maximum variance in the dataset. (C)</p> Signup and view all the answers

What is the effect of high-dimensional data in machine learning that PCA seeks to mitigate?

<p>Overfitting issues (D)</p> Signup and view all the answers

Flashcards

Principal Component Analysis (PCA)

A technique that reduces the complexity of a dataset without losing important information, often by converting correlated variables into independent ones, making it easier to understand patterns and relationships within the data.

PCA in Machine Learning

PCA falls within the Unsupervised Machine Learning category, meaning it doesn't require labeled data. It aims to discover inherent structures and relationships within the data itself.

Handling the ‘Curse of Dimensionality’

PCA is a useful approach to handle the ‘Curse of Dimensionality’ a common problem in machine learning, occurring when the number of features in a dataset is too large.

Feature Selection with PCA

PCA helps find the most important features in a dataset, often by combining multiple features into a smaller set of ‘principal components’ which can be used to build more accurate and interpretable models.

Signup and view all the flashcards

Data Normalization in PCA

PCA normalizes the data by subtracting the mean and dividing by the standard deviation, ensuring each variable has a mean of 0 and a standard deviation of 1.

Signup and view all the flashcards

Data Visualization with PCA

PCA helps you visualize complex high-dimensional datasets - converting multiple variables into a smaller number of components, making it easier to understand the relationships between data points and identify clusters.

Signup and view all the flashcards

Covariance Matrix in PCA

Calculating the covariance matrix to capture the relationships between variables - measuring how much they change together, indicating whether they are dependent or independent.

Signup and view all the flashcards

Calculating Eigenvalues and Eigenvectors

The Eigenvectors represent directions of greatest variance in the data, while the Eigenvalues quantify the amount of variance along each Eigenvector direction. The normalized Eigenvectors are then used to create the principal components.

Signup and view all the flashcards

Study Notes

Machine Learning - Principal Components Analysis (PCA)

  • PCA is a dimensionality reduction technique in unsupervised machine learning
  • PCA aims to reduce the number of variables in a dataset while retaining as much information as possible
  • PCA is mainly used for dimensionality reduction and feature selection
  • PCA transforms correlated features into independent features
  • PCA can explain the variance and covariance using multiple linear combinations of core variables. Row scattering can be analyzed using PCA, identifying distribution-related properties.
  • It is a technique to handle the curse of dimensionality in machine learning
  • Sufficient data creates a more accurate prediction model.
  • High-dimensional data causes overfitting issues; dimensionality reduction addresses this
  • Helps locate important characteristics and discover linear combinations of varied sequences.

How PCA Works

  • Original Data: The initial dataset
  • Normalize data: The original data is normalized - mean = 0, variance =1
  • Calculate covariance matrix: Calculates the relationship between all variables.
  • Calculate Eigen values, Eigenvectors: The Eigenvalues determine the importance, while Eigenvectors reveal the direction of each main component
  • Calculate Principal Component (PC): Combining the data by the most signficant Eigen vectors.
  • Plot for orthogonality: The plot visuals the orthogonality/relationship between PCs

Covariance Matrix

  • Contains covariance values between all dimensions/attributes.
  • Covariance measures correlation between variables
    • Cov(X,Y) = 0, independent
    • Cov(X,Y) > 0, move in same direction
    • Cov(X,Y) < 0, move in opposite direction
    • Calculated as: cov(X,Y) = Σ((Xáµ¢-XÌ„)(Yáµ¢-ȳ))/(n-1)

Eigenvalues & Eigenvectors

  • Vectors that have the same direction as Ax (where A is a matrix) are called eigenvectors
  • Eigenvalues represent the importance of each eigenvector
  • Calculation steps: calculate det(A-λI), determining the roots(eigenvalues λ), and solving (A-λI) x=0 for each λ to get the eigenvectors x

Principal Components - Variance

  • Eigenvalues correspond to the variance on PCs.
  • The first p eigenvectors (based on top eigenvalues) represent the directions with the largest variances in the data.

Transformed Data

  • Eigenvalues represent the variance for the new dimensions or principal components.
  • Sort eigenvalues from highest to lowest.
  • Take the first p eigenvectors that correspond to the the top p eigenvalues. These are the directions with the largest variance
  • A transformation matrix can be created using these eigenvectors to transform the original data into a new coordinate system. With the transformed data, the original data is described in terms of the major variances

Advantages of PCA in ML

  • Reduces dimensionality
  • Eliminate related features (multicollinearity)
  • Speed up training
  • Overcomes overfitting by eliminating extraneous features.

Disadvantages of PCA in ML

  • Best for quantitative data; not effective for qualitative.
  • Difficult to interpret components.

Applications of PCA

  • Computer vision
  • Bioinformatics
  • Image compression and resizing
  • High-dimensional pattern discovery
  • Reduction of dimensions
  • Multidimensional data visualization.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser