Unsupervised Learning Concepts

Study Notes

Clustering: Machine learning tasks to find patterns of similarity in data samples and cluster them into groups based on shared attributes/features.
Visualization: Algorithms converting complex, unlabeled data into 2D or 3D representations. These algorithms aim to preserve data structure as much as possible
Dimensionality Reduction: Simplifying data by merging correlated features into one. Useful to reduce data dimensionality and enhance speed of other algorithms
Anomaly Detection: Identifying unusual data instances (outliers) to detect anomalies like fraud or manufacturing defects.
Association Rule Mining: Identifying relationships between attributes in large datasets. An example would be finding that customers purchasing barbecue sauce and potato chips tend also buy steak.

K-Means: A common clustering algorithm that groups data points into clusters.
DBSCAN: A clustering algorithm that automatically determines the number of clusters in data.
Hierarchical Cluster Analysis (HCA): A hierarchical clustering algorithm that clusters data points into a tree-like structure.

Batch Learning: The system is incapable of learning incrementally and needs all available data to train. Training is done offline and the system operates without any further learning.
Online Learning: The system trains incrementally via feeding data sequentially. Training is fast and cheap, as it accommodates new data flow.

Instance-Based Learning: The system learns from data by heart and generalizes/predicts based on similarities.
Model-Based Learning: The system creates a model from data (e.g., an equation) with parameters that make predictions.