Machine Learning: Supervised vs Unsupervised Learning

IntegralRaleigh avatar
IntegralRaleigh
·
·
Download

Start Quiz

Study Flashcards

24 Questions

What is the primary goal of clustering?

To discover a new set of categories based on distance and similarity measures

What is the main difference between supervised and unsupervised learning?

The presence or absence of a target attribute

What does multidimensional scaling aim to do?

Identify an Euclidean space of small dimensions and a nonlinear mapping

What is association rule discovery?

A data mining technique that finds collections of attributes that frequently appear together

What is mixture decomposition?

A venerable area of statistics devoted to identifying the parametric densities of individual populations

What is principal component analysis?

A method seeks to find uncorrelated features obtained as linear combinations of the original features

What is the main goal of unsupervised learning?

To explore the data to find some intrinsic structures in them

What is clustering?

A technique for finding similarity groups in data, called clusters

What is the goal of a good clustering method?

To produce high quality clusters with high intra-class similarity and low inter-class similarity

What is an example of using clustering in marketing?

Segmenting customers according to their similarities

What is the main difference between hierarchical and partitioning methods in clustering?

Hierarchical methods create a hierarchical decomposition, while partitioning methods construct various partitions

What is the difference between agglomerative and divisive methods in clustering?

Agglomerative methods start with each sample in its own cluster, while divisive methods start with all samples in one cluster

What is the main advantage of using clustering in data analysis?

It helps to discover hidden patterns in the data

What is an example of using clustering in real-life?

Grouping people of similar sizes together to make different sized T-Shirts

What is the main characteristic of a good clustering method?

It produces high quality clusters with high intra-class similarity and low inter-class similarity

What is the main application of clustering in text analysis?

To organize text documents according to their content similarities

What is the main difference between monothetic and polythetic methods?

Polythetic methods use collections of features, while monothetic methods use one feature at a time

In hard clustering, a sample can belong to?

One and only one cluster

What is the purpose of distance measures in clustering?

To determine the similarity or dissimilarity between objects

In probabilistic clustering, a point belongs to a cluster with?

A certain probability

What is the goal of intra-clusters distance in clustering?

To minimize the distance within clusters

What is the main factor that determines the quality of a clustering result?

The application and algorithm used

What is the purpose of similarity measures in clustering?

To determine the similarity between objects

What is the type of clustering where samples have different degrees of membership to different clusters?

Fuzzy clustering

Study Notes

Unsupervised Learning vs. Supervised Learning

  • Unsupervised learning: no target attribute, explore data to find intrinsic structures
  • Supervised learning: discover patterns in data that relate to a target attribute, predict values of target attribute in future data instances

Unsupervised Learning Types

  • Clustering: identify "groups" in data
  • Mixture decomposition: identify parametric densities of individual populations
  • Principal component analysis: find uncorrelated "features" obtained as linear combinations of original features
  • Association rule discovery: find collections of attributes that frequently appear together
  • Multidimensional scaling: identify a Euclidean space of small dimensions, and a nonlinear mapping from original space to new space

Clustering

  • Technique for finding similarity groups in data
  • Goal: discover a new set of categories based on distance measures and similarity measures
  • Good clustering method: produce high-quality clusters with high intra-class similarity and low inter-class similarity

Clustering Methods

  • Hierarchical vs. Partitional Methods
    • Hierarchical: create a hierarchical decomposition of the set of data
    • Partitional: construct various partitions and evaluate them by some criterion
  • Agglomerative vs. Divisive Methods
    • Agglomerative: start by assigning each sample to its own cluster, and merge clusters
    • Divisive: start by assigning all samples to a unique cluster, and split clusters
  • Monothetic vs. Polythetic Methods
    • Monothetic: learn clusters using one feature at a time
    • Polythetic: use collections of features
  • Hard vs. Fuzzy Methods
    • Hard: each sample belongs to one and only one cluster
    • Fuzzy: samples have different degrees of membership to different clusters

Distance Measures

  • Required for clustering to determine similarity or dissimilarity between objects
  • Two main types: distance measures and similarity measures
  • Distance measures used to determine similarity or dissimilarity between objects
  • Clustering quality depends on algorithm, distance function, and application

Learn about the key differences between supervised and unsupervised learning in machine learning, including their definitions, uses, and applications.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free
Use Quizgecko on...
Browser
Browser