Machine Learning - Principal Components Analysis (PCA)
8 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary goal of Principal Component Analysis?

  • To increase the number of variables in a dataset
  • To reduce the number of variables while preserving information (correct)
  • To retain all the original variables in analysis
  • To create a supervised learning model
  • Which of the following best describes PCA's role in machine learning?

  • It specializes in handling uncorrelated features.
  • It directly increases accuracy by generating more data.
  • It helps in dimensionality reduction and feature selection. (correct)
  • It is used for classification of labeled data.
  • What does a covariance value of 0 indicate in PCA?

  • Features have an inverse relationship.
  • Features are independent of each other. (correct)
  • There is a positive correlation between the features.
  • Features are entirely dependent on each other.
  • Which of the following is NOT a step involved in PCA?

    <p>Calculating the loss function</p> Signup and view all the answers

    How does PCA help in addressing the curse of dimensionality?

    <p>By reducing the dimensions of the data</p> Signup and view all the answers

    What type of learning category does PCA belong to?

    <p>Unsupervised Learning</p> Signup and view all the answers

    Which of the following describes the eigenvalues and eigenvectors in PCA?

    <p>They indicate the maximum variance in the dataset.</p> Signup and view all the answers

    What is the effect of high-dimensional data in machine learning that PCA seeks to mitigate?

    <p>Overfitting issues</p> Signup and view all the answers

    Study Notes

    Machine Learning - Principal Components Analysis (PCA)

    • PCA is a dimensionality reduction technique in unsupervised machine learning
    • PCA aims to reduce the number of variables in a dataset while retaining as much information as possible
    • PCA is mainly used for dimensionality reduction and feature selection
    • PCA transforms correlated features into independent features
    • PCA can explain the variance and covariance using multiple linear combinations of core variables. Row scattering can be analyzed using PCA, identifying distribution-related properties.
    • It is a technique to handle the curse of dimensionality in machine learning
    • Sufficient data creates a more accurate prediction model.
    • High-dimensional data causes overfitting issues; dimensionality reduction addresses this
    • Helps locate important characteristics and discover linear combinations of varied sequences.

    How PCA Works

    • Original Data: The initial dataset
    • Normalize data: The original data is normalized - mean = 0, variance =1
    • Calculate covariance matrix: Calculates the relationship between all variables.
    • Calculate Eigen values, Eigenvectors: The Eigenvalues determine the importance, while Eigenvectors reveal the direction of each main component
    • Calculate Principal Component (PC): Combining the data by the most signficant Eigen vectors.
    • Plot for orthogonality: The plot visuals the orthogonality/relationship between PCs

    Covariance Matrix

    • Contains covariance values between all dimensions/attributes.
    • Covariance measures correlation between variables
      • Cov(X,Y) = 0, independent
      • Cov(X,Y) > 0, move in same direction
      • Cov(X,Y) < 0, move in opposite direction
      • Calculated as: cov(X,Y) = Σ((Xᵢ-X̄)(Yᵢ-ȳ))/(n-1)

    Eigenvalues & Eigenvectors

    • Vectors that have the same direction as Ax (where A is a matrix) are called eigenvectors
    • Eigenvalues represent the importance of each eigenvector
    • Calculation steps: calculate det(A-λI), determining the roots(eigenvalues λ), and solving (A-λI) x=0 for each λ to get the eigenvectors x

    Principal Components - Variance

    • Eigenvalues correspond to the variance on PCs.
    • The first p eigenvectors (based on top eigenvalues) represent the directions with the largest variances in the data.

    Transformed Data

    • Eigenvalues represent the variance for the new dimensions or principal components.
    • Sort eigenvalues from highest to lowest.
    • Take the first p eigenvectors that correspond to the the top p eigenvalues. These are the directions with the largest variance
    • A transformation matrix can be created using these eigenvectors to transform the original data into a new coordinate system. With the transformed data, the original data is described in terms of the major variances

    Advantages of PCA in ML

    • Reduces dimensionality
    • Eliminate related features (multicollinearity)
    • Speed up training
    • Overcomes overfitting by eliminating extraneous features.

    Disadvantages of PCA in ML

    • Best for quantitative data; not effective for qualitative.
    • Difficult to interpret components.

    Applications of PCA

    • Computer vision
    • Bioinformatics
    • Image compression and resizing
    • High-dimensional pattern discovery
    • Reduction of dimensions
    • Multidimensional data visualization.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    This quiz explores Principal Components Analysis (PCA), a fundamental technique in unsupervised machine learning. Learn how PCA helps reduce the dimensionality of data while preserving essential information, and discover its applications in feature selection and overcoming overfitting. Test your knowledge on PCA's functions and implications in data analysis.

    More Like This

    Use Quizgecko on...
    Browser
    Browser