K-Means Clustering Algorithm

Study Notes

K-means is a type of unsupervised machine learning algorithm used for clustering data.
It groups similar data points into clusters based on their features.

Initialization:
- Choose a value for K (number of clusters).
- Randomly assign centroids (cluster centers) for each cluster.
Assignment:
- Calculate the distance between each data point and the centroid of each cluster.
- Assign each data point to the cluster with the closest centroid.
Update:
- Calculate the new centroid of each cluster as the mean of all data points assigned to that cluster.
- Repeat steps 2-3 until convergence or a stopping criterion is reached.

Centroids: The mean vector of each cluster.
Cluster assignment: The process of assigning each data point to a cluster.
Distance metric: Typically, Euclidean distance is used to calculate the distance between data points and centroids.

Customer segmentation: Clustering customers based on demographics and behavior.
Image segmentation: Clustering pixels in an image to identify objects or features.
Gene expression analysis: Clustering genes based on their expression levels.

Initialization involves choosing a value for K (number of clusters) and randomly assigning centroids (cluster centers) for each cluster.
Assignment involves calculating the distance between each data point and the centroid of each cluster and assigning each data point to the cluster with the closest centroid.
Update involves calculating the new centroid of each cluster as the mean of all data points assigned to that cluster, and repeating steps 2-3 until convergence or a stopping criterion is reached.

Centroids are the mean vector of each cluster.
Cluster assignment is the process of assigning each data point to a cluster.
Distance metric is typically Euclidean distance used to calculate the distance between data points and centroids.

Customer segmentation: Clustering customers based on demographics and behavior.
Image segmentation: Clustering pixels in an image to identify objects or features.
Gene expression analysis: Clustering genes based on their expression levels.