Machine Learning: Unsupervised Learning

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

How does unsupervised learning differ from classical programming in terms of input and output?

  • Classical programming and unsupervised learning both take rules and data as input and provide answers as output.
  • Classical programming takes data as input and provides rules as output, while unsupervised learning takes rules and data as input and provides answers as output.
  • Classical programming takes rules and data as input and provides answers as output, while unsupervised learning takes only data as input and provides segmented data as output. (correct)
  • Classical programming takes answers as input and provides data as output, while unsupervised learning takes rules as input and provides data as output.

Which of the following is a key characteristic of unsupervised learning?

  • Minimizing the need for iterative processes in finding patterns.
  • The learning algorithm presents a structure for human review. (correct)
  • Providing labeled examples for the algorithm to learn from.
  • A single, definitive solution is always guaranteed.

How is the success of unsupervised learning typically evaluated?

  • By determining the R-squared value of the model.
  • By assessing if the data can be visualized informatively and if subgroups can be discovered. (correct)
  • By using predefined metrics to measure accuracy.
  • By calculating the precision and recall of the clustered data.

Which aspect of unsupervised learning necessitates 'human supervision'?

<p>Humans choose the learning algorithm, distance metrics, and feature selection. (A)</p> Signup and view all the answers

What role can unsupervised learning play in exploratory data analysis (EDA)?

<p>It can segment data, revealing patterns and relationships beneficial for EDA. (B)</p> Signup and view all the answers

If clustering is also known as a 'segmentation technique,' what does this imply about the process?

<p>It involves dividing data into separate, distinct groups. (D)</p> Signup and view all the answers

What is the primary goal when identifying homogeneous subsets in clustering?

<p>To achieve high similarity within each subset. (C)</p> Signup and view all the answers

In the context of clustering, how is 'similarity' typically determined for numeric variables?

<p>By calculating the absolute distance between the values. (A)</p> Signup and view all the answers

Given a dataset, what does each feature (column) represent in the context of clustering?

<p>A dimension with potential similarity between instances. (A)</p> Signup and view all the answers

How do distance-based algorithms react to outliers in a dataset?

<p>They are highly sensitive to outliers. (A)</p> Signup and view all the answers

Why is feature scaling used to handle outliers in clustering?

<p>To give all features an equal opportunity to contribute to the model. (C)</p> Signup and view all the answers

Which of the following are common methods of cluster analysis?

<p>Hierarchical and K-Means. (B)</p> Signup and view all the answers

What is a key characteristic of hierarchical clustering?

<p>Begins with one cluster and iteratively splits the clusters. (A)</p> Signup and view all the answers

Under what circumstances is hierarchical clustering most appropriate?

<p>When it is meaningful to apply to small datasets. (A)</p> Signup and view all the answers

What type of data is K-Means clustering generally used for?

<p>Numerical data with set of measurements (C)</p> Signup and view all the answers

What specific type of input is required for K-Means clustering?

<p>Numerical data, over the variable space (A)</p> Signup and view all the answers

What are the key outputs of the K-Means clustering algorithm?

<p>The data is divided into centroid and segments (A)</p> Signup and view all the answers

Within K-means clustering, what must be true of the distance between two records?

<p>It must be absolute value (D)</p> Signup and view all the answers

How does distance relate with the data points (records) in K-means clustering?

<p>The distance between record I and record J is symmetrical. (A)</p> Signup and view all the answers

In the context of unsupervised learning, what is meant when it is described as an 'exploratory technique'?

<p>It is used to discover structure in the data. (B)</p> Signup and view all the answers

What is the role of unsupervised learning with classification?

<p>It is used to discover classes. (A)</p> Signup and view all the answers

Which of the following is an appropriate use case for unsupervised learning?

<p>Classify customers. (D)</p> Signup and view all the answers

What type of variables can be used to see the use case?

<p>Household income. (D)</p> Signup and view all the answers

In clustering, under what conditions would you describe a subset as being 'homogeneous'?

<p>When the subset consist of elements that are similar. (C)</p> Signup and view all the answers

How does unsupervised learning enable visualization?

<p>It allows to structure the data for human review creating effective visualization. (D)</p> Signup and view all the answers

When using unsupervised learning to visualize data, what are two specific questions that could be asked?

<p>Can subgroups be discovered? How can we visualize the data informatively? (B)</p> Signup and view all the answers

Besides Hierarchical Clustering, which following methods fall along the same analysis?

<p>K-Means. (D)</p> Signup and view all the answers

In the process of K-means, what should the sum of the distances be between records

<p>Cannot be greater than the sum. (A)</p> Signup and view all the answers

What is a use case that can have unsupervised learning within 7-Eleven?

<p>What is the best location for products in the store. (D)</p> Signup and view all the answers

In the context of categorical variables, how is 'similarity' typically determined?

<p>By identifying same values. (B)</p> Signup and view all the answers

Which clustering helps determine similarity density?

<p>Measure the distance between features. (C)</p> Signup and view all the answers

Which statement describes feature scaling as a method to handling outliers?

<p>Decrease influence of larger-scale features. (D)</p> Signup and view all the answers

How does determining distance compare with Distance from record I to J in K-Means?

<p>Same as measure from J to I. (D)</p> Signup and view all the answers

What should be the principle held true about negative values.

<p>Distance is positive. (C)</p> Signup and view all the answers

Why should the algorithm be selected manually?

<p>The algorithm will do the search, but human supervision is required in selection. (A)</p> Signup and view all the answers

How does measuring with K-means distance from one record relate to itself?

<p>Equals 0. (B)</p> Signup and view all the answers

What should be consider the amount of clusters when using Hierarchical Clustering.

<p>Meaningful when it to it to a small data set. (D)</p> Signup and view all the answers

Flashcards

Unsupervised Learning

A type of machine learning where the algorithm learns from unlabeled data to identify patterns and structures without explicit guidance.

Unsupervised Learning Process

Providing data to an algorithm, which then presents a structure for human review, allowing iterative discovery of patterns and relationships.

Measuring Unsupervised Learning

There are no direct metrics; instead, it involves visualizing data informatively and discovering subgroups.

Human Supervision in Unsupervised Learning

It requires human involvement to select algorithms, distance metrics, features, and interpret results, even though the algorithm performs the search.

Signup and view all the flashcards

Exploratory Data Analysis (EDA)

Unsupervised learning used to explore data and uncover hidden patterns or structures, serving as a preliminary step before classification.

Signup and view all the flashcards

Clustering

A segmentation technique dividing data into separate parts or clusters based on similarity.

Signup and view all the flashcards

Clustering Approach

The similarity between data points determines the cluster to which they belong.

Signup and view all the flashcards

Similarity Measurement

For numeric variables, is based on the distance (delta) between values; for categorical, it's based on shared values.

Signup and view all the flashcards

Similarity in Datasets

Each feature represents a dimension; distance is measured to find density and identify clusters.

Signup and view all the flashcards

Outlier Sensitivity

Distance-based algorithms are sensitive to outliers in the dataset.

Signup and view all the flashcards

Feature Scaling

A method to handle outliers by transforming feature values to a common scale.

Signup and view all the flashcards

Cluster Analysis Types

Two main types are hierarchical and K-Means.

Signup and view all the flashcards

Hierarchical Clustering

An iterative process starting with one cluster, splitting until complete.

Signup and view all the flashcards

K-Means Clustering

Used for clustering numerical data based on distance metrics like Euclidean distance, outputs cluster centers (centroids).

Signup and view all the flashcards

Distance Measurement Principles

Distance is non-negative, zero from a record to itself, symmetric between records, and obeys the triangle inequality.

Signup and view all the flashcards

Unsupervised Learning Uses

Discovering data structure, summarizing cluster properties, and serving as a prelude to classification.

Signup and view all the flashcards

Study Notes

  • Machine Learning 1 is covered in Lecture 10, focusing on Unsupervised Learning.
  • Unsupervised Learning can be used to organize blocks or books.

Unsupervised Learning

  • There is data that goes into machine learning, which produces segmented data.
  • No "labeled examples" of the correct answer are provided.
  • A learning algorithm presents a structure for human review.
  • It's an iterative process that allows for finding meaningful patterns and relationships.
  • Unsupervised Learning asks the questions related to data visualization and discovery of subgroups.
  • Requires human supervision and the algorithm searches.
  • Learning algorithm and distance metrics are selected.
  • It's used as part of exploratory data analysis (EDA).

Clustering

  • Clustering is a type of Unsupervised Learning.
  • Another name for it is a segmentation technique, involving dividing info into separate parts.
  • There is no one correct answer for the approach, it depends on the goals and is constrained by available data.
  • Homogeneous subsets are identified based on similarity within the subset and the number of subsets.

Similarity

  • Approach is based on similarity.
  • For numeric variables, similarity uses distance (delta between values).
  • For categorical variables, similarity is based on having the same values.
  • Each feature (column) in a dataset is a dimension with potential similarity between instances (rows).
  • Distance between features is measured to find "density," i.e., points of concentration forming clusters.
  • Distance-based algorithms can be impacted by the presence of outliers.
  • Points beyond the "whiskers" are considered outliers

Feature Scaling

  • Outliers are addressed via feature scaling by transforming the values of a feature to a common scale, such as between 0 and 1.
  • Feature scaling is applied so all features have an equal opportunity to contribute, preventing larger-scale features from dominating the model.

Cluster Analysis

  • Cluster Analysis' include:
    • Hierarchical
    • K-Means

Hierarchical Clustering

  • It's an iterative process of starting with one cluster and splitting until done.
  • It's computationally intensive and difficult to do on large datasets.
  • Use it when it's meaningful to a small dataset, like stores vs customers.

K-Means Clustering

  • It's used for clustering numerical data, like a set of measurements about objects of interest.
  • The input must be numerical, with a defined distance metric over the variable space like Euclidian distance.
  • The Output is the centers of each discovered cluster, and the assignment of each input datum to a cluster (Centroid).
  • In K-means two measures of distances are defined - the distance between two data points and the distance between two clusters.
  • Distance may be calculated in a number of ways, but principles tend to hold true:
    • Distance is not negative
    • Distance from one record to itself is zero.
    • The distance from record I to record J is the same as from record J to record I.
    • The distance between two records can not be greater than the sum of the distances between each record and a third record.

Unsupervised Learning Use Cases

  • Exploratory technique to discover data structures and summarize cluster properties.
  • Prelude to classification for "discovering the classes."
  • Examples of use cases include - household income, yearly purchase amount in dollars, the number of household members in customer households.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser