9-Expectation-Maximization-in Clustering.pdf

Full Transcript

Expectation-Maximization in Clustering EM (Expectation-Maximization) is the underlying principle of the standard algorithm for -means: 1. 2. 3. 4. Choose initial model parameters Expect latent variables (e.g., cluster assignment) from and the data. Update to maximize the likelihood of observing the...

Expectation-Maximization in Clustering EM (Expectation-Maximization) is the underlying principle of the standard algorithm for -means: 1. 2. 3. 4. Choose initial model parameters Expect latent variables (e.g., cluster assignment) from and the data. Update to maximize the likelihood of observing the data Repeat (2.)–(3.) until a stopping condition holds Gaussian Mixture Modeling (GMM): [DeLaRu77] 1. choose centers randomly, unit covariance, and uniform weight: 2. expect cluster labels based on Gaussian distribution density 3. update Gaussian models with the mean, covariance matrix, and weight 4. repeat (2.)–(3.) until change Objective: Optimize log-likelihood: Fitting Multiple Gaussian Distributions Probability density function of a multivariate Gaussian: If we constrain we can control the cluster shape: ▶ we always want symmetric and positive semi-definite ▶ covariance matrix: rotated ellipsoid (A) ▶ diagonal (“variance matrix”): ellipsoid (B) ▶ scaled unit matrix: spherical (C) ▶ same for all clusters, or different each 44 Gaussian Mixture Modeling as E-M-Optimization The multivariate Gaussian density with center and covariance matrix is: For the Expectation Step, we use Bayes’ rule: Using and the law of total probability: For the Maximization Step: Use weighted mean and weighted covariance to recompute cluster model. 45 Algorithm: EM Clustering The generic EM clustering algorithm is: The log likelihood is improved by both the E-step, and the M-step. For Gaussian Mixture modeling, In contrast to -means, we need to use a stopping threshold, as the change will not become. 46 Numerical Issues in GMM Unfortunately, Gaussian Mixture Modeling can have numerical problems: ▶ computing the inverse is often unstable, the matrix can become singular ▶ computing the covariances can have numerical issues [ScGe18] ▶ can get stuck in local optima and saddle points [JZBW16] We hence should implement the following improvements: ▶ add a small constant to the diagonal of to avoid singularities (as done, e.g., in the scikit-learn implementation; corresponding to a uniform prior) ▶ use the covariance of the entire data set as prior (add a small multiple to each ) i.e., perform maximum-a-posteriori (MAP) estimation [FrRa07] ▶ use numerically stable approaches for computing the covariance matrix [ScGe18] 47

Use Quizgecko on...
Browser
Browser