9 - Expectation-Maximization in Clustering

Study Notes

EM is the underlying principle of the standard algorithm for k-means
The algorithm consists of four steps:
- Choose initial model parameters
- Expect latent variables (e.g., cluster assignment) from the data
- Update to maximize the likelihood of observing the data
- Repeat steps 2-3 until a stopping condition holds

GMM is an extension of k-means that allows for different cluster shapes and densities
The algorithm consists of four steps:
- Choose initial centers, unit covariance, and uniform weight
- Expect cluster labels based on Gaussian distribution density
- Update Gaussian models with the mean, covariance matrix, and weight
- Repeat steps 2-3 until a stopping condition holds

The probability density function of a multivariate Gaussian is a function of the mean, covariance matrix, and weight
The covariance matrix controls the shape of the cluster
A symmetric and positive semi-definite covariance matrix is required
The covariance matrix can be:
- Rotated ellipsoid (A)
- Diagonal (“variance matrix”) ellipsoid (B)
- Scaled unit matrix, spherical (C)
- Different for each cluster or the same for all clusters

The Expectation Step uses Bayes' rule and the law of total probability
The Maximization Step uses weighted mean and weighted covariance to recompute cluster model

The generic EM clustering algorithm is:
- E-step: compute the probability of each data point belonging to each cluster
- M-step: update the cluster model parameters
The log likelihood is improved by both the E-step and the M-step

Numerical problems can occur in GMM, such as:
- Unstable computation of the inverse of the covariance matrix
- Singular covariance matrix
- Numerical issues in computing the covariances
- Getting stuck in local optima and saddle points
Improvements can be implemented to avoid these issues, such as:
- Adding a small constant to the diagonal of the covariance matrix to avoid singularities
- Using the covariance of the entire data set as prior
- Using numerically stable approaches for computing the covariance matrix