Clustering in Unsupervised Learning
12 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary goal of cluster analysis?

  • To divide data into meaningful clusters (correct)
  • To predict continuous outcomes
  • To identify classes with known labels
  • To identify relationships between variables
  • Which of the following is a type of proximity measure?

  • Partitional clustering
  • Hierarchical clustering
  • Dendrogram
  • L1 norm (correct)
  • What is the main difference between partitional and hierarchical clustering?

  • The type of proximity measure used
  • The number of clusters
  • Whether the clusters are overlapping or non-overlapping
  • Whether the clusters are nested or non-nested (correct)
  • What is the purpose of adopting a (dis)similarity measure in clustering?

    <p>To determine the similarity between objects</p> Signup and view all the answers

    What is a dendrogram used to represent?

    <p>Hierarchical clusters</p> Signup and view all the answers

    What is a characteristic of the clusters formed in cluster analysis?

    <p>They are always non-overlapping</p> Signup and view all the answers

    What is the initial step in the k-means clustering algorithm?

    <p>Randomly choose k objects from the training set as the prototypes</p> Signup and view all the answers

    What is the common issue with k-means clustering in small data sets?

    <p>The algorithm is sensitive to the initial choice of objects as cluster centers</p> Signup and view all the answers

    What is the purpose of pre-processing in k-means clustering?

    <p>To standardize or normalize the data and eliminate or reduce the effect of outliers</p> Signup and view all the answers

    What is the key operation in agglomerative hierarchical clustering?

    <p>Finding the two features that are 'closest' in multivariate space</p> Signup and view all the answers

    What is the objective function in k-means clustering?

    <p>The sum of the squared distances to the cluster centers</p> Signup and view all the answers

    What is the characteristic of hierarchical clustering?

    <p>Each instance starts off as its own cluster, and is subsequently joined to the 'nearest' instance to form a new cluster</p> Signup and view all the answers

    Study Notes

    Clustering with Unsupervised Learning

    • Unsupervised learning involves unknown class labels, and the data is plotted to identify natural clusters.
    • Cluster analysis aims to divide data into meaningful and/or useful clusters that may or may not correspond to human perception of similarity.

    Characteristics of Clusters

    • Clusters should comprise objects that are similar to each other and different from those in other clusters.
    • A (dis)similarity measure is required, often taken as a proximity measure (e.g., L1, L2, or L∞ norm).

    Clustering Types

    • Clustering can be partitional (flat) or hierarchical.
    • Partitional clustering divides data into non-overlapping subsets (clusters) where each data point is in exactly one subset.
    • Hierarchical clustering produces nested clusters, often represented by a hierarchical tree or dendrogram.

    k-means Clustering (Partitional Clustering)

    • Randomly choose k objects from the training set as prototypes.
    • Assign all other objects to the nearest prototype to form clusters based on Euclidean distance (or other norm).
    • Update the new prototype of each cluster as the centroid of all objects assigned to that cluster.
    • Repeat until convergence (i.e., no data point changes clusters, or centroids remain the same).
    • k-means clustering is a heuristic algorithm with no guarantee of convergence to the global optimum.
    • The result is sensitive to the initial choice of objects as cluster centers, especially for small data sets.

    k-means Clustering Algorithm

    • The algorithm can be viewed as a greedy algorithm for partitioning n samples into k clusters to minimize an objective function (e.g., sum of squared distances to cluster centers, SSE).
    • SSE is calculated by summing the squared errors (i.e., distances to the closest centroid) for each data point.

    Pre- and Post-processing

    • Pre-processing steps can improve the final result, including standardizing (or normalizing) the data and eliminating or reducing the effect of outliers.
    • Post-processing can include splitting “loose” clusters and merging “close” clusters.

    Agglomerative Hierarchical Clustering

    • Each instance starts off as its own cluster and is subsequently joined to the “nearest” instance to form a new cluster.
    • The algorithm is a bottom-up technique, where larger clusters are obtained at each step.
    • The key operation is the computation of proximity in step (i), which can be defined in various ways.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    This quiz explores clustering with unsupervised learning, where data is divided into clusters that are meaningful and useful. It covers the concept of similarity and dissimilarity measures in clustering.

    More Like This

    Use Quizgecko on...
    Browser
    Browser