Dimensionality Reduction Techniques

Study Notes

A process of converting a high-dimensional dataset into a lower dimensional dataset, preserving similar information.
Used for training machine learning algorithms.
Applied in various fields including speech recognition, signal processing, bioinformatics, data visualization, noise reduction, cluster analysis.

Feature Selection
- Filter Methods
- Wrapper Methods
- Intrinsic/ Embedded Methods
Feature Extraction
- PCA (Principal Component Analysis)
- Factor Analysis
- Singular Value Decomposition

A part of dimensionality reduction, where raw data is divided into manageable groups.
Involves mapping original features into a lower dimensional space, expressing them as a function of the feature set.
Lower dimensions should be uncorrelated.
Features extracted from images, text, geospatial data, date and time, web data, sensor data.

An interdependence technique that analyzes correlations between variables to reduce them into fewer factors, explaining much of the original data.
Assumptions:
- Variables must be related, with sufficient correlations (Bartlett's test).
- A minimum sample size of 50, preferably 100, and a minimum of 5 observations per item, preferably 10 observations per item.

Exploratory Factor Analysis (EFA)
- Uses Principal Component Analysis (PCA/ Thurstone)
Confirmatory Factor Analysis (CFA)
- Uses Structural Equations Modelling (SEM)

Overloading: Identifies variables with the lowest communality and deletes them to address loading.
Cross Loading: The proportion of variance in a single variable that is captured by extracted factors is known as communality.

Factor: A linear composite of variables.
Factor Score: The score on a given factor.
Eigen Value: Represents the sum of squares of variables for a factor loading. Eigen values less than 1 are usually omitted.
Scree Plot: A graphical representation used in factor analysis that helps determine the optimal number of factors to retain.
Factor Rotation: A technique used in factor analysis to improve the interpretability of the factors.

Correlation Matrix.
Bartlett's Test of Sphericity: Tests the null hypothesis that the variables in the population correlation matrix are uncorrelated.
KMO (Kaiser-Meyer-Olkin): Measure of sampling adequacy; values between 0.8 and 1 indicate adequate sampling.
Communalities: The proportion of variance in each original variable that is accounted for by the extracted factors.
Total Variance.
Scree Plot.
Component Matrix.
Rotated Component Matrix.

KMO values between 0.8 and 1 suggest adequate sampling.
Values less than 0.6 (0.5 to 0.6) imply inadequate sampling.
Values closer to 0 indicate more partial correlations than total correlations, which is not suitable for factor analysis.

A graphical representation of the eigenvalues, plotted against the factor number in descending order.
Used to determine the appropriate number of factors to retain.

In marketing research, to understand consumer motives for purchasing products or brands.
To identify the most important attributes of products or services in the minds of customers.
Example: A two-wheeler manufacturer uses factor analysis to determine variables potential customers consider when evaluating their product.

A school system surveyed 120 students to rate the importance of 9 teacher characteristics using a Likert scale (1-10).
Characteristics rated
- Setting high expectations.
- Entertaining.
- Able to communicate effectively.
- Having expertise in their subject.
- Able to motivate.
- Caring.
- Charismatic.
- Having a passion for teaching.
- Friendly and easy-going.