Machine Learning: Clustering and PCA Concepts
48 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary goal of supervised learning?

  • To discover similarities in data without any target attribute.
  • To find and predict patterns related to a target attribute. (correct)
  • To organize data based on a priori grouping.
  • To segment data instances into clusters.
  • Clustering is considered a supervised learning task.

    False

    Name one application of clustering in marketing.

    Segment customers according to their similarities.

    In clustering, data instances that are similar are grouped into one ______.

    <p>cluster</p> Signup and view all the answers

    Match the following applications with their clustering examples:

    <p>T-Shirt sizing = Grouping people according to size Marketing = Segmenting customers based on similarities Document organization = Producing a topic hierarchy Medicine = Grouping patients with similar conditions</p> Signup and view all the answers

    What is one key characteristic of unsupervised learning?

    <p>It explores data to find intrinsic structures.</p> Signup and view all the answers

    Clustering only applies to fields related to technology.

    <p>False</p> Signup and view all the answers

    Why is clustering important in the context of online documents?

    <p>To organize them according to content similarities.</p> Signup and view all the answers

    What is the primary function of the first principal component in PCA?

    <p>It defines the line in p-dimensional space that is closest to the n observations.</p> Signup and view all the answers

    The proportions of variance explained (PVE) by each principal component sum to more than one.

    <p>False</p> Signup and view all the answers

    What is the recommended action if the variables in PCA are in different units?

    <p>Scale each variable to have a standard deviation equal to one.</p> Signup and view all the answers

    The first principal component explains _____% of the variance in the data according to the information provided.

    <p>62.0</p> Signup and view all the answers

    How much variance is explained by the next principal component after the first?

    <p>24.7%</p> Signup and view all the answers

    Match the following terms related to PCA with their correct descriptions:

    <p>First Principal Component = Defines the closest line to the n observations Variance Explained = Proportion of total variance attributed to a component Scaling = Ensures variables are on the same scale Cumulative PVE = Total variance explained by multiple components</p> Signup and view all the answers

    The last two principal components explain a high percentage of the variance in the data.

    <p>False</p> Signup and view all the answers

    What is the main purpose of principal component analysis (PCA)?

    <p>To reduce the dimensionality of data while retaining as much variance as possible.</p> Signup and view all the answers

    What is a significant disadvantage of K-means clustering?

    <p>It requires pre-specifying the number of clusters K.</p> Signup and view all the answers

    Agglomerative clustering starts with all data points in one cluster.

    <p>False</p> Signup and view all the answers

    What is a dendrogram?

    <p>A tree-like structure that represents a nested sequence of clusters.</p> Signup and view all the answers

    In hierarchical clustering, the __________ method builds the dendrogram from the bottom level.

    <p>Agglomerative</p> Signup and view all the answers

    Match the following clustering types with their descriptions:

    <p>Agglomerative Clustering = Merges the most similar clusters from the bottom up Divisive Clustering = Starts with one cluster and splits into smaller clusters Dendrogram = A tree structure showing cluster relationships Singleton Clusters = Clusters consisting of a single data point</p> Signup and view all the answers

    In a dendrogram, cutting it at a certain height results in how many clusters?

    <p>Different clusters based on the cut height</p> Signup and view all the answers

    Divisive clustering results in all data points merged into one cluster.

    <p>False</p> Signup and view all the answers

    How does agglomerative clustering determine which clusters to merge?

    <p>It merges clusters with the least distance between them.</p> Signup and view all the answers

    Which of the following best describes the K-means clustering algorithm?

    <p>It assigns data points based on user-defined clusters.</p> Signup and view all the answers

    In K-means clustering, the centroids are recalculated after every assignment of data points.

    <p>True</p> Signup and view all the answers

    What is the primary goal of clustering in a data set?

    <p>To group similar data points together based on certain characteristics.</p> Signup and view all the answers

    The K-means algorithm aims to minimize the sum of squared error (SSE) across all clusters, defined as SSE = $\sum_{j=1}^{k} \sum_{x \in C_j} dist(x, m_j)^2$ where $C_j$ is the ________ of the jth cluster.

    <p>centroid</p> Signup and view all the answers

    Match the aspects of clustering with their correct descriptions:

    <p>Partitional clustering = Divides data into non-overlapping subsets Hierarchical clustering = Creates a tree of clusters Distance function = Measures similarity or dissimilarity Clustering quality = Determines the effectiveness of the clusters formed</p> Signup and view all the answers

    Which of the following is NOT a stopping criterion for K-means clustering?

    <p>Maximum number of clusters specified by the user</p> Signup and view all the answers

    Clustering algorithms can be used in various fields such as medicine, sociology, and biology.

    <p>True</p> Signup and view all the answers

    What characteristic should be maximized and minimized for clustering quality?

    <p>Maximize inter-cluster distance and minimize intra-cluster distance.</p> Signup and view all the answers

    What is the time complexity of the k-means algorithm?

    <p>O(tkn)</p> Signup and view all the answers

    K-means is considered a non-linear algorithm.

    <p>False</p> Signup and view all the answers

    What is one method to deal with outliers in k-means clustering?

    <p>Remove outliers or perform random sampling</p> Signup and view all the answers

    K-means clustering is sensitive to __________.

    <p>outliers</p> Signup and view all the answers

    Which of the following is a strength of k-means clustering?

    <p>It is one of the most popular clustering algorithms.</p> Signup and view all the answers

    The k-means algorithm always finds the global optimum regardless of the initialization.

    <p>False</p> Signup and view all the answers

    What technique can be used to solve the optimization problem in PCA?

    <p>Singular-value decomposition</p> Signup and view all the answers

    Match the following terms related to k-means with their descriptions:

    <p>Centroid = A point that represents the average position of all points in a cluster SSE = Sum of squared errors used to measure clustering goodness Outlier = A data point that is significantly different from the rest Random Sampling = Selecting a small subset of data points to reduce the chance of outliers</p> Signup and view all the answers

    What user input is necessary for k-means clustering to function?

    <p>The number of clusters (k)</p> Signup and view all the answers

    The first principal component is uncorrelated with the second principal component.

    <p>True</p> Signup and view all the answers

    What are the principal component scores for the first principal component generally referred to as?

    <p>z11, z21, ..., zn1</p> Signup and view all the answers

    The principal component directions are the ordered sequence of ______ of the matrix $X^TX$.

    <p>eigenvectors</p> Signup and view all the answers

    Match the following terms with their definitions:

    <p>Principal Component = A linear combination of variables that retains maximum variance Singular-value Decomposition (SVD) = A technique for solving PCA optimization problems Loading Vector = Defines the direction in feature space Eigenvalue = Represents the variance associated with a principal component</p> Signup and view all the answers

    In PCA, how many principal components can be at most generated?

    <p>min(n - 1, p)</p> Signup and view all the answers

    The second principal component can be correlated with the first principal component.

    <p>False</p> Signup and view all the answers

    List the three crimes featured in the USAarrests data set.

    <p>Assault, Murder, Rape</p> Signup and view all the answers

    Study Notes

    Introduction to Machine Learning AI 305

    • Unsupervised learning is a subset of machine learning.
    • Clustering is a technique in unsupervised learning.
    • Clustering finds similarity groups in data, called clusters.

    Supervised Learning vs. Unsupervised Learning

    • Supervised learning discovers patterns relating data attributes to a target (class) attribute. These patterns help predict values of the target attribute in future data.
    • Unsupervised learning lacks a target attribute. The goal is to explore the data and find inherent structures within it.

    Clustering

    • Clustering is a method for finding similarity groups in data, creating clusters of similar data points.
    • Data points similar to each other are grouped into the same cluster.
    • Data points far apart, dissimilar, are grouped into different clusters.
    • Clustering is often considered a form of unsupervised learning.

    An Illustration

    • A dataset can have naturally occurring groups of data points—natural clusters.

    What is Clustering For?

    • Example 1: Grouping people by size to make different sizes of clothing (small, medium, large).
    • Example 2: Marketing segments customers based on similarities, targeting specific subgroups with advertising or product recommendations.
    • Example 3: Organizing documents by topic similarities (e.g., news articles) to create a topic hierarchy. This is important for organizing large volumes of text data.
    • Clustering is used in many fields like medicine, psychology, and marketing.

    Aspects of Clustering

    • Clustering Algorithm: This includes partitions (like K-means) and hierarchical methodologies.
    • Distance (Similarity or Dissimilarity) function: Determines how similar or dissimilar data points are.
    • Clustering Quality: Measures how good the clusters are, typically looking for minimized intra-cluster distance and maximized inter-cluster distances.

    K-means Clustering

    • A partitional clustering algorithm.
    • Divides data points into k clusters.
    • Each cluster has a center called a centroid.
    • K is specified by the user.

    K-means Algorithm Steps

    • Step 1: Randomly choose k data points to be the initial centroids (cluster centers).
    • Step 2: Assign each data point to the centroid. The dataset is divided (partitioned) into k clusters.
    • Step 3: Recalculate the centroids using the current assignments of data points.
    • Step 4: Repeat steps 2 and 3 until a convergence criterion is met (no more significant changes in assignments).
    • Different seed selections lead to differing cluster formation.

    K-means Algorithm (cont) steps to detail

    • Step 1: Choose k data points (seeds) and set as initial centroids.
    • Step 2 (iterate): For each data point: compute the distance to each centroid; assign to the nearest centroid.
    • Step 3: Re-compute the centroids based on the current assignments; until a stopping criterion.
    • Stopping criteria: no (or minimum) re-assignments to different clusters.
    • Also minimum change in centroids or minimum decrease in SSE (sum of squared errors). -SSE calculation: SSE = ∑∑ dist(x,mⱼ)²

    Stopping/Convergence Criterion

    • No further changes in data point assignments, centroid locations.
    • Minimum decrease in SSE (sum of squared errors) between iterations.

    Strengths of K-means

    • Simple to understand and implement.
    • Efficient, with time complexity O(tkn) which is linear when t and k are relatively small.

    Weaknesses of K-means

    • Dependent on the correct value of k (number of clusters).
    • Sensitive to outliers, that are very distant data points.
    • Dependent on initial seed selection, as different initial cluster centers, will yield different clusters.
    • Not suitable for clusters that have non-spherical shapes or hyper-ellipsoidal patterns.

    Selecting 'k'

    • In the plot k = 2, k = 3, k=4 show how data clusters using k-means will change. A simulated data set with 150 observations in a 2-dimensional space is shown for illustration.

    Weaknesses of K-means: Outliers

    • Remove data points that are far from the centroids.
    • Monitor outliers over iterations for removal.
    • Use Random Sampling to reduce the likelihood of selecting outliers.
    • Assign remaining data points via distance or classification.

    Weaknesses of K-means: Initial Seeds

    • Random seed selection produces different clustering results
    • Different methods that can be used to help select seeds.

    Clustering Algorithm Summary

    • K-means, despite weaknesses, is still popular.
    • Other algorithms may be better for specific tasks, needs, and data sets.
    • Comparing clustering algorithms is complex and subjective.

    Common Ways to Represent Clusters

    • Use centroids to represent clusters
    • Calculate spread of each cluster by calculating radius and standard deviation.

    Using Classification Model

    • All data points are assumed to share the same class label (cluster ID).

    Use frequent values to represent cluster

    • Use for categorical data.
    • Often used in text clustering.

    Clusters Of Arbitrary Shapes

    • Hyper-ellipsoidal or hyper-spherical clusters are easier to represent using centroids and spread measures.
    • Irregular shaped clusters are more complex to represent.

    Hierarchical Clustering

    • Avoids pre-defining 'k'
    • Creates nested sequence of clusters, a tree (dendrogram).
    • Different methodologies exist (divisive and agglomerative).

    Agglomerative Clustering

    • Data points are initially individual clusters (nodes)
    • Merge nodes with the shortest distance between corresponding datapoints from different clusters until a single cluster.

    Agglomerative Clustering Algorithm Steps

    1. Each data point in the dataset D is a separate cluster
    2. Compute distances between all pairs of data points in D(X1,X2,...,Xn)
    3. Repeatedly merge the two closest clusters as new clusters are formed
    4. Repeat steps 2 and 3 until only a single cluster remains.

    Measuring Distance Between Clusters

    • There are different ways to measure distances between different clusters.
    • Different variations include complete link, single link, average link, and centroids.

    Types of Hierarchical Clustering Methods

    • Agglomerative (bottom-up): Starts with individual data points as clusters and merges them iteratively based on similarity
    • Divisive (top-down): Starts with all data points in one cluster and recursively splits them into smaller clusters based on dissimilarity.
    • Distance between two clusters is the shortest distance between any two data points.
    • It favors longer "chains" of clusters, and very sensitive to noise or outliers.
    • May form undesirable “chain effects”
    • Maximum distance between two data points will be the distance between two clusters.
    • Less sensitive to outliers than single linkage but can potentially miss clusters.
    • Compromise between single-link and complete-link.
    • The average distance between all pairs of data points between the two clusters will be used as the distance.

    Centroid Method

    • Distance is computed based on the distance between the centroids.

    Distance Function

    • “Similarity” or “dissimilarity” are important for clustering concept
    • Methods exist for many types of data (nominal, numeric), and differing applications.
    • Metrics used include Euclidean distance (squared distance), Manhattan distance, Chebychev distance.

    How to Choose a Clustering Algorithm

    • No single best algorithm
    • Various algorithms exist each with strengths, weaknesses, suitable use cases.
    • Trial and error via testing, is typically used.
    • Consider the type of data, desired characteristics of clusters.

    Principal Components Analysis (PCA)

    • Data visualization or preprocessing for supervised learning.
    • Produces low-dimensional representation (reduced dimensionality).
    • Finds linear combinations of variables with maximal variance, uncorrelatedness.

    Principal Components Analysis (PCA): Details

    • First principal component : normalized linear combination of features with largest variance
    • Z₁= Φ₁₁X₁ + Φ₂₁X₂ + ...+ Φp₁Xp
    • Loadings of first principal component are Φ₁₁, Φ₂₁, ...,Φp₁.

    PCA: Example

    • Illustrates population size and ad spending for 100 different cities.
    • First and second Principal Components are depicted visually (in 2d).

    Computation of Principal Components

    • Assumes variables have mean zero
    • Calculate the linear combination with the largest variance.
    • SVD (singular value decomposition) or eigen decomposition can do the calculation.

    Geometry of PCA

    • Loading vector (Φ₁) represents direction in feature space with maximum variance
    • Projecting data points onto this vector generates principal component scores.

    Further Components

    • Subsequent PCs (principal components) are orthogonal (uncorrelated with previous PCs) with maximum variance.

    Further Principal Components Continued

    • The principal component directions (Φᵢ) are the ordered sequence of right singular vectors.
    • Variances of the components are the squares of the singular values.
    • Components are at most min(n-1, p)

    Illustration - Example

    • Example illustrates use of USAarrests data including crimes (Assault, Murder, Rape), population percentage, and state characteristics.

    Example Details

    • First two principal components of USAarrests are examined.
    • Visualization of these components highlight relationships.
    • The first component is primarily related to overall crime rates among states.
    • The second component focuses primarily on the urbanization level of states.

    Example Details - Continued

    • Relationship between and correlation using component scores and loading vector projections.

    Another Interpretation of Principal Components

    • Data points are in a p-dimensional Space, and there is a hyper plane that is closest to these observations
    • A hyperplane has a special loading vector that can represent this closest hyperplane (closest to the n observations). Using average squared Euclidean distance as a measure of closeness).

    Scaling the Variables

    • If variables are in different units, they should be scaled (typically to unit variance to ensure numerical stability).

    Proportion Variance Explained (PVE)

    • Determine the strength of each component:
    • Calculate how much variance in the data each principle component explains.
    • Explains the proportion of variance.
    • The sum of the PVEs (principle components) is 100%.

    Proportion Variance Explained, continued

    • The PVE for a component is the ratio of that component’s variance to the total variance
    • Cumulative PVEs can be displayed and observed
    • Using just the first few principle components can accurately capture the variation in the data.

    How Many Components Should We Use?

    • Determining a sufficient number of principal components is not always straightforward in unsupervised learning.
    • Cross-validation is not always possible.
    • “Scree plot” can help, looking for an elbow in a plot of the PVE. Use visual examination.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    Test your understanding of supervised and unsupervised learning, focusing particularly on clustering and Principal Component Analysis (PCA). This quiz covers key characteristics, applications, and concepts related to these important machine learning techniques.

    More Like This

    Use Quizgecko on...
    Browser
    Browser