Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Document Details

PositiveQuadrilateral

Uploaded by PositiveQuadrilateral

Escola Maristes Rubí

Tags

machine learning clustering unsupervised learning

Full Transcript

Unsupervised Learning Unsupervised Learning ✔ Unsupervised learning includes all kinds of machine learning where there is no known output, no teacher to instruct the learning algorithm. ✔ The learning algorithm is just shown with the input data and asked to extract knowledge from this...

Unsupervised Learning Unsupervised Learning ✔ Unsupervised learning includes all kinds of machine learning where there is no known output, no teacher to instruct the learning algorithm. ✔ The learning algorithm is just shown with the input data and asked to extract knowledge from this data. ✔ The goal of the algorithm is to organize the data in some way or to describe its structure. ✔ This includes dimension reduction(transformation) and clustering. Types of Unsupervised Learning ✔ Unsupervised Transformation : includes algorithms that creates a new representation of the data which might be easier for humans or other machine learning algorithms to understand compared to the original representation of the dataset. ✔ A common application of unsupervised transformation includes dimensionality reduction, which takes a high-dimensional representation of the data, consisting of many features, and finds a new way to represents this data that summarizes the actual data. ✔ Clustering algorithms: tries to partition data into distinct groups of similar items. Unsupervised Learning ✔ Unsupervised learning includes all kinds of machine learning where there is no known output, no teacher to instruct the learning algorithm. ✔ The learning algorithm is just shown with the input data and asked to extract knowledge from this data. ✔ The goal of the algorithm is to organize the data in some way or to describe its structure. ✔ This includes dimension reduction(transformation) and clustering. Unsupervised Clustering What is Clustering? ✔ Clustering in machine learning, is a method of grouping data points into similar clusters. It is also called segmentation. ✔ For example, you might apply clustering to find similar people by demographics. You might use clustering with text analysis to group sentences with similar topics or sentiment. Clustering K-Mean Clustering Unsupervised Learning K-Mean Clustering Unsupervised Learning Step 1: Cluster Assignment (Randomly initialize cluster centroids) Step 1: Cluster Assignment (Assigned data points to nearest centroid) 10 Step 2: Move cluster centroids (Find the mean of data points in given cluster) Step 2:Move cluster centroids (Move cluster centroid to the mean and Assigned data points to nearest centroid) Repeat the steps Stop when cluster centroids are not moving anymore For visulaization- go through below link https://hckr.pl/k-means-visualization/ Objective of K-Means Optimization (Summary) The objective of K-means optimization is to find the best possible clustering of data points into K clusters. Minimize the Within-Cluster Sum of Squares (WCSS):The primary goal of K-means optimization is to minimize the within-cluster sum of squares (WCSS), also known as inertia or squared error. This measures how tightly packed the data points are within each cluster. Objective of K-Means Optimization (Summary) How It Works 1. Cluster Assignment: 1. Assign each data point to the nearest centroid (the center of a cluster). Each data point belongs to the cluster whose centroid is closest to it. 2.Centroid Update: 1. After all points are assigned, update the centroids. The new centroid of each cluster is the mean (average) of all data points assigned to that cluster. 3.Repeat: 1. Repeat the assignment and update steps iteratively. Each iteration reassigns points to the closest centroid and recalculates the centroids based on the new assignments. 4.Convergence: 1. The algorithm stops when the centroids no longer change significantly or the assignments of data points to clusters stabilize. At this point, the clustering has converged to a solution where the WCSS is minimized. Objective Function Formula: Mathematically, the cost function (objective function) to be minimized is: ▪ To get optimal solution, run k-mean for multiple iteration choosing different random cluster centroids and choose the one which gives you lowest cost function. Why Minimize WCSS? 1. Compact Clusters: By minimizing WCSS, K-means ensures that data points within each cluster are as close as possible to the cluster’s centroid, resulting in compact clusters. 2. Better Separation: A lower WCSS means that points are closer to their respective centroids, which generally indicates better separation between clusters. 3. Optimization Process: The iterative process of K-means adjusts the centroids and assignments to reduce WCSS, gradually improving the clustering until an optimal or near-optimal clustering is achieved. Clustering - Evaluating clustering models Evaluating Unsupervised Clustering model Despite the fact that we lack the outputs needed to compare our results to those of supervised models, we do have some measurement scores that can be used to assess how well unsupervised models perform. Evaluating clustering models involves assessing how well the algorithm has grouped similar data points together while keeping dissimilar points apart. Evaluating Unsupervised Clustering model ➔ The most popular is the silhouette score or coefficient ➔ Silhouette score measures how similar a data point is to its own cluster (cohesion) compared to other clusters (seperation). ➔ The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample ➔ The Silhouette Coefficient for a sample is (b - a) / max(a, b). ➔ To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of Evaluating Unsupervised Clustering model ⮚ The silhouette score ranges from -1 to 1, with higher values indicating better clustering. ⮚ A score of 1 indicates that the data point is well-matched to its own cluster and poorly matched to neighboring clusters. ⮚ A score of 0 indicates that the data point is on the boundary between two clusters. ⮚ A score of -1 indicates that the data point is better matched to a neighboring cluster than to its own cluster. Thank You

Use Quizgecko on...
Browser
Browser