Unit-3 Machine Learning PDF

UNIT-III K-means clustering, Mixtures of Gaussians, Alternative view of EM, Factor analysis, PCA, Choosing the number of latent dimensions, Clustering- measuring dissimilarity, Evaluating the output of clustering methods, Hierarchial clustering. INTRODUCTION Additional latent variables allows to express relatively complex marginal distributions over latent variables in terms of more tractable joint distributions over the expanded space. Maximum-Likelihood estimator in such a space is the Expectation-Maximization (EM) algorithm K-MEANS CLUSTERING K-MEANS CLUSTERING K-MEANS CLUSTERING Robbins-Monro procedure: K-MEANS CLUSTERING Image Segmentation and Compression The goal of segmentation is to partition an image into regions each of which has a reasonably homogeneous visual appearance or which corresponds to objects or parts of objects. Each pixel in an image is a point in a 3-dimensional space comprising the intensities of the red, blue, and green channels, and our segmentation algorithm simply treats each pixel in the image as a separate data point. We illustrate the result of running K-means to convergence, for any particular value of K, by re-drawing the image replacing each pixel vector with the {R, G, B} intensity triplet given by the centre µ k to which that pixel has been assigned. Results for various values of K are shown in below Figure Two examples of the application of the K-means clustering algorithm to image segmentation showing the initial images together with their K-means segmentations obtained using various values of K. This also illustrates of the use of vector quantization for data compression, in which smaller values of K give higher compression at the expense of poorer image quality Mixtures of Gaussians Latent variables The basic idea is to model dependence between two variables by adding an edge between them in the graph. An alternative approach is to assume that the observed variables are correlated because they arise from a hidden common “cause”. Model with hidden variables are also known as latent variable models or LVMs. Such models are harder to fit than models with no latent variables. However, they can have significant advantages, for two main reasons. Mixtures of Gaussians Latent variables Mixtures of Gaussians Mixtures of Gaussians Mixtures of Gaussians Maximum Likelihood Mixtures of Gaussians EM for Gaussian Mixtures Mixtures of Gaussians Mixtures of Gaussians Mixtures of Gaussians EM for Gaussian Mixtures Summary An Alternative View of EM In practice, the complete data set {X,Z} could not be obtained, so only the incomplete data X will be available. The state of knowledge of the values of the latent variables in Z is given only by the posterior distribution p(Z|X, θ). Due to incomplete data, each cycle of EM will increase the incomplete-data log likelihood. Solution: ▪ The distribution of the observed values is obtained by taking the joint distribution of all the variables and then marginalizing over the missing ones. ▪ EM can then be used to maximize the corresponding likelihood function EM algorithm. ▪ It will be a valid procedure if the data values are missing at random, that the mechanism causing values to be missing does not depend on the unobserved values ▪ Eg: sensor fails to return a value whenever the quantity it is measuring exceeds some threshold Relation to K-means ▪ Comparison of the K-means algorithm with the EM algorithm for Gaussian mixtures shows that there is a close similarity. ▪ K-means algorithm performs a hard assignment of data points to clusters, in which each data point is associated uniquely with one cluster whereas the EM algorithm makes a soft assignment based on the posterior probabilities 16 An Alternative View of EM An Alternative View of EM An Alternative View of EM An Alternative View of EM An Alternative View of EM Factor analysis Problem- Mixture models are limited by discrete, mutually exclusive latent variables (one-hot encoding). Factor analysis Factor analysis FA is a low rank parameterization of an MVN Factor Analysis (FA): Specifies a joint density model x using fewer parameters. Factor analysis Inference of the latent factors Factor analysis Factor analysis Unidentifiability Factor analysis Unidentifiability Principal components analysis (PCA) Principal components analysis (PCA) Principal components analysis (PCA) Principal components analysis (PCA) Classical PCA: statement of the theorem Principal components analysis (PCA) Principal components analysis (PCA) Principal components analysis (PCA) Singular value decomposition (SVD) Choosing the number of latent dimensions Model selection for FA/PPCA Model selection for PCA Clustering A grouping of data objects such that the objects within a group are similar (or near) to one another and dissimilar (or far) from the objects in other groups. This method is defined under the branch of Unsupervised Learning, which aims at gaining insights from unlabeled data points, that is, unlike supervised learning we don’t have a target variable. Clustering Now it is not necessary that the clusters formed must be circular in shape. The shape of clusters can be arbitrary. There are many algortihms that work well with detecting arbitrary shaped clusters. Clustering aims at forming groups of homogeneous data points from a heterogeneous dataset. It evaluates the similarity based on a metric like Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group the points with highest similarity score together. Clustering Types of clustering Partitional - each object belongs in exactly one cluster Hierarchical - a set of nested clusters organized in a tree A distinction among different types of clustering's is whether the set of clusters is nested or unnested. A partitional clustering a simply a division of the set of data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset. A hierarchical clustering is a set of nested clusters that are organized as a tree. Similarity vs Dissimilarity measures The similarity or dissimilarity between two objects is generally based on difference in corresponding attribute values. In a clustering algorithm, the similarity or dissimilarity between objects is usually measured by a distance function. Data Types Similarity and Dissimilarity Measures For nominal variables, these measures are binary, indicating whether two values are equal or not. For ordinal variables, it is the difference between two values that are normalized by the max distance. For the other variables, it is just a distance function. Measuring (dis)similarity Some common attribute dissimilarity functions are as follows: Evaluating the output of clustering methods Purity The purity ranges between 0 (bad) and 1 (good). However, we can trivially achieve a purity of 1 by putting each object into its own cluster, so this measure does not penalize for the number of clusters. Purity is Three clusters with labeled objects inside Rand index Mutual information Hierarchical clustering Why hierarchical clustering? Agglomerative clustering Single link Complete link Average link Divisive clustering Another method involves building a minimum spanning tree from the dissimilarity graph and creating new clusters by breaking the link with the largest dissimilarity. Dissimilarity analysis steps: 4. Continue moving objects from G to H until a stopping criterion is met. At each step, pick the point i* that maximizes the difference between the dissimilarity to G and the dissimilarity to H: 5. Stop the process when become negative

Unit-3 Machine Learning PDF

Document Details

Tags

Related

Summary

Full Transcript