6-Introduction-to-Clustering.pdf
Document Details
Uploaded by ThrillingTuba
Tags
Related
- Lecture 5 - Data Mining Continued 4fda5aafc9ef47229be04359186c0c5e.pdf
- 3. Basics of Clustering and k-means clustering.pdf
- 8. Density-Estimates and Density-based Clustering (Flat and Hierarchical).pdf
- Topic_6b___Similarity_LECT.pdf
- 3. Basics of Clustering and k-means clustering.pdf
- 7 TIA Clustering.pdf
Full Transcript
What is Clustering? Cluster analysis (clustering, segmentation, quantization, …) is the data mining core task to find clusters. But what is a cluster? [Esti02] ▶ cannot be precisely defined ▶ many different principles and models have been defined ▶ even more algorithms, with very different results ▶...
What is Clustering? Cluster analysis (clustering, segmentation, quantization, …) is the data mining core task to find clusters. But what is a cluster? [Esti02] ▶ cannot be precisely defined ▶ many different principles and models have been defined ▶ even more algorithms, with very different results ▶ when is a result “valid”? ▶ results are subjective “in the eye of the beholder” ▶ no specific definition seems “best” in the general case [Bonn64] Common themes found in definition attempts: ▶ more homogeneous ▶ more similar ▶ cohesive 3 What is Clustering? /2 Cluster analysis (clustering, segmentation, quantization, …) is the data mining core task to divide the data into clusters such that: ▶ similar (related) objects should be in the same cluster ▶ dissimilar (unrelated) objects should be in different clusters ▶ clusters are not defined beforehand (otherwise: use classification) ▶ clusters have (statistical, geometric, …) properties such as: ▶ connectivity ▶ separation ▶ least squared deviation ▶ density Clustering algorithms have different ▶ cluster models (“what is a cluster for this algorithm?”) ▶ induction principles (“how does the algorithm find clusters?”) 4 Applications of Clustering /2 ▶ Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus, and species ▶ Information retrieval: document clustering ▶ Land use: identification of areas of similar land use in an Earth observation database ▶ Marketing: help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs ▶ City-planning: Identifying groups of houses according to their house type, value, and geographical location ▶ Earthquake studies: observed epicenters should be clustered along continent faults ▶ Climate: understanding Earth climate, find patterns of atmospheric and oceanic phenomena ▶ Economic Science: market research 5 Basic Steps for Clustering Feature selection ▶ select information (about objects) concerning the task of interest ▶ aim at minimal information redundancy ▶ weighting of information Clustering algorithm and parameters ▶ distance and similarity measure suitable for the problem ▶ cluster quality criterion / cost function / objective ▶ algorithms to use with this distance and quality criterion Validation and interpretation of the results ▶ validation test ▶ integration with applications 6