16 Questions
What is the primary purpose of cluster analysis?
To identify sub-groups within the data set that exhibit similar behavior
What is a characteristic of cluster analysis?
It is an exploratory process that does not require hypotheses
Why is using correlations as a distance measure problematic?
Because correlations measure similarity in variation, not similarity in scores
What is an advantage of using cluster analysis?
It can handle non-normally distributed data
What is a limitation of cluster analysis?
It may not be strongly replicable, meaning different datasets may not produce the same clusters
What type of cluster analysis is characterized by the formation of a hierarchy of clusters?
Hierarchical cluster analysis
What is an example of a scenario where cluster analysis would be useful?
Identifying sub-groups of consumers with similar purchasing behaviors
What is a key consideration when choosing a distance measure for cluster analysis?
The type of data being analyzed
What is the difference between Euclidian distance and City-block distance?
Euclidian distance is the hypotenuse, while City-block distance is the sum of non-hypotenuse sides.
What is the purpose of selecting a distance metric in cluster analysis?
To measure the distance between data points.
What is the difference between hierarchical and k-means clustering?
Hierarchical clustering treats new clusters as single points, while k-means clustering treats new clusters as a collection of points.
What is the purpose of a proximity matrix in cluster analysis?
To commence clustering.
What is the difference between single linkage and complete linkage?
Single linkage iteratively groups variables with the shortest distance, while complete linkage iteratively groups variables with the furthest distance.
What is the purpose of determining the number of clusters in cluster analysis?
To identify the underlying structure of the data.
What is the difference between two-step clustering and hierarchical clustering?
Two-step clustering is a combination of hierarchical and k-means clustering methods.
What is the purpose of a dendrogram in hierarchical clustering?
To visualize the clustering results.
Study Notes
Cluster Analysis
- Aims to identify sub-groups (clusters) within a dataset, where participants behave similarly
- Clusters should have higher within-group similarity than between-group similarity
Purpose of Cluster Analysis
- An exploratory process, unlikely to have hypotheses in advance about group behavior
- May not be strongly replicable, and different datasets may yield different clusters
- A simple method for identifying latent classes of participants
Why Use Cluster Analysis?
- Data may not be normally distributed
- Substantial individual differences may not be captured by means
- Example: locations of people in Australia, where the mean may not represent the data accurately
Distance Measures/Metrics
- Correlations measure similar variation, not similar scores
- Instead, seek similarity in actual values using distance metrics
- Types of distance metrics:
- Euclidean distance (hypotenuse)
- City-block distance (taxi-cab geometry, sum of non-hypotenuse sides)
Cluster Analysis Process
- Select a distance metric (e.g., block, squared Euclidean, Euclidean)
- Commence clustering using a proximity matrix
- Choose a clustering method:
- Hierarchical
- K-means
- Two-step
- Combine clusters using methods such as:
- Nearest neighbor (single linkage or shortest distance)
- Furthest neighbor (complete linkage or furthest distance)
Hierarchical Clustering
- Dendrogram: a graphical representation of the clustering process
- Treat new clusters as single points rather than collections of points
K-means Clustering
- Output: clusters and centroids
- Different from hierarchical clustering in approach
Two-Step Clustering
- See summary below for details
Determining the Number of Clusters
- Hierarchical clustering: use the dendrogram to determine the number of clusters
- K-means clustering: output provides information on the number of clusters
Understanding cluster analysis, including distance measures, types of cluster analysis, and its purpose in identifying sub-groups within a data-set.
Make Your Own Quizzes and Flashcards
Convert your notes into interactive study material.
Get started for free