Clustering in Unsupervised Learning

Study Notes

Unsupervised learning involves unknown class labels, and the data is plotted to identify natural clusters.
Cluster analysis aims to divide data into meaningful and/or useful clusters that may or may not correspond to human perception of similarity.

Clusters should comprise objects that are similar to each other and different from those in other clusters.
A (dis)similarity measure is required, often taken as a proximity measure (e.g., L1, L2, or L∞ norm).

Clustering can be partitional (flat) or hierarchical.
Partitional clustering divides data into non-overlapping subsets (clusters) where each data point is in exactly one subset.
Hierarchical clustering produces nested clusters, often represented by a hierarchical tree or dendrogram.

Randomly choose k objects from the training set as prototypes.
Assign all other objects to the nearest prototype to form clusters based on Euclidean distance (or other norm).
Update the new prototype of each cluster as the centroid of all objects assigned to that cluster.
Repeat until convergence (i.e., no data point changes clusters, or centroids remain the same).
k-means clustering is a heuristic algorithm with no guarantee of convergence to the global optimum.
The result is sensitive to the initial choice of objects as cluster centers, especially for small data sets.

The algorithm can be viewed as a greedy algorithm for partitioning n samples into k clusters to minimize an objective function (e.g., sum of squared distances to cluster centers, SSE).
SSE is calculated by summing the squared errors (i.e., distances to the closest centroid) for each data point.

Pre-processing steps can improve the final result, including standardizing (or normalizing) the data and eliminating or reducing the effect of outliers.
Post-processing can include splitting “loose” clusters and merging “close” clusters.

Each instance starts off as its own cluster and is subsequently joined to the “nearest” instance to form a new cluster.
The algorithm is a bottom-up technique, where larger clusters are obtained at each step.
The key operation is the computation of proximity in step (i), which can be defined in various ways.