Podcast
Questions and Answers
What is the primary goal of Principal Component Analysis?
What is the primary goal of Principal Component Analysis?
- To increase the number of variables in a dataset
- To reduce the number of variables while preserving information (correct)
- To retain all the original variables in analysis
- To create a supervised learning model
Which of the following best describes PCA's role in machine learning?
Which of the following best describes PCA's role in machine learning?
- It specializes in handling uncorrelated features.
- It directly increases accuracy by generating more data.
- It helps in dimensionality reduction and feature selection. (correct)
- It is used for classification of labeled data.
What does a covariance value of 0 indicate in PCA?
What does a covariance value of 0 indicate in PCA?
- Features have an inverse relationship.
- Features are independent of each other. (correct)
- There is a positive correlation between the features.
- Features are entirely dependent on each other.
Which of the following is NOT a step involved in PCA?
Which of the following is NOT a step involved in PCA?
How does PCA help in addressing the curse of dimensionality?
How does PCA help in addressing the curse of dimensionality?
What type of learning category does PCA belong to?
What type of learning category does PCA belong to?
Which of the following describes the eigenvalues and eigenvectors in PCA?
Which of the following describes the eigenvalues and eigenvectors in PCA?
What is the effect of high-dimensional data in machine learning that PCA seeks to mitigate?
What is the effect of high-dimensional data in machine learning that PCA seeks to mitigate?
Flashcards
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
A technique that reduces the complexity of a dataset without losing important information, often by converting correlated variables into independent ones, making it easier to understand patterns and relationships within the data.
PCA in Machine Learning
PCA in Machine Learning
PCA falls within the Unsupervised Machine Learning category, meaning it doesn't require labeled data. It aims to discover inherent structures and relationships within the data itself.
Handling the ‘Curse of Dimensionality’
Handling the ‘Curse of Dimensionality’
PCA is a useful approach to handle the ‘Curse of Dimensionality’ a common problem in machine learning, occurring when the number of features in a dataset is too large.
Feature Selection with PCA
Feature Selection with PCA
Signup and view all the flashcards
Data Normalization in PCA
Data Normalization in PCA
Signup and view all the flashcards
Data Visualization with PCA
Data Visualization with PCA
Signup and view all the flashcards
Covariance Matrix in PCA
Covariance Matrix in PCA
Signup and view all the flashcards
Calculating Eigenvalues and Eigenvectors
Calculating Eigenvalues and Eigenvectors
Signup and view all the flashcards
Study Notes
Machine Learning - Principal Components Analysis (PCA)
- PCA is a dimensionality reduction technique in unsupervised machine learning
- PCA aims to reduce the number of variables in a dataset while retaining as much information as possible
- PCA is mainly used for dimensionality reduction and feature selection
- PCA transforms correlated features into independent features
- PCA can explain the variance and covariance using multiple linear combinations of core variables. Row scattering can be analyzed using PCA, identifying distribution-related properties.
- It is a technique to handle the curse of dimensionality in machine learning
- Sufficient data creates a more accurate prediction model.
- High-dimensional data causes overfitting issues; dimensionality reduction addresses this
- Helps locate important characteristics and discover linear combinations of varied sequences.
How PCA Works
- Original Data: The initial dataset
- Normalize data: The original data is normalized - mean = 0, variance =1
- Calculate covariance matrix: Calculates the relationship between all variables.
- Calculate Eigen values, Eigenvectors: The Eigenvalues determine the importance, while Eigenvectors reveal the direction of each main component
- Calculate Principal Component (PC): Combining the data by the most signficant Eigen vectors.
- Plot for orthogonality: The plot visuals the orthogonality/relationship between PCs
Covariance Matrix
- Contains covariance values between all dimensions/attributes.
- Covariance measures correlation between variables
- Cov(X,Y) = 0, independent
- Cov(X,Y) > 0, move in same direction
- Cov(X,Y) < 0, move in opposite direction
- Calculated as: cov(X,Y) = Σ((Xᵢ-X̄)(Yᵢ-ȳ))/(n-1)
Eigenvalues & Eigenvectors
- Vectors that have the same direction as Ax (where A is a matrix) are called eigenvectors
- Eigenvalues represent the importance of each eigenvector
- Calculation steps: calculate det(A-λI), determining the roots(eigenvalues λ), and solving (A-λI) x=0 for each λ to get the eigenvectors x
Principal Components - Variance
- Eigenvalues correspond to the variance on PCs.
- The first p eigenvectors (based on top eigenvalues) represent the directions with the largest variances in the data.
Transformed Data
- Eigenvalues represent the variance for the new dimensions or principal components.
- Sort eigenvalues from highest to lowest.
- Take the first p eigenvectors that correspond to the the top p eigenvalues. These are the directions with the largest variance
- A transformation matrix can be created using these eigenvectors to transform the original data into a new coordinate system. With the transformed data, the original data is described in terms of the major variances
Advantages of PCA in ML
- Reduces dimensionality
- Eliminate related features (multicollinearity)
- Speed up training
- Overcomes overfitting by eliminating extraneous features.
Disadvantages of PCA in ML
- Best for quantitative data; not effective for qualitative.
- Difficult to interpret components.
Applications of PCA
- Computer vision
- Bioinformatics
- Image compression and resizing
- High-dimensional pattern discovery
- Reduction of dimensions
- Multidimensional data visualization.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.