Supervised Learning and Classification Quiz
64 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following is a key characteristic of supervised learning classification?

  • It does not require labeled training data.
  • The output variable is always numeric.
  • The class membership of each sample is unknown.
  • There is a finite number of classes that are known. (correct)

In regression analysis, what is the goal of modeling?

  • To find a model of the relationship between inputs and targets. (correct)
  • To predict the future values based on past observations.
  • To fit a curve that represents categorical outputs.
  • To classify objects into distinct categories.

What is the main difference between prediction and curve fitting?

  • Prediction uses categorical outputs, while curve fitting uses numerical outputs.
  • Prediction aims to determine mappings, while curve fitting models underlying data curves. (correct)
  • There is no significant difference; they are essentially the same.
  • Curve fitting requires labeled data, while prediction does not.

What type of encoding is commonly used in classification when handling multiple classes?

<p>One-hot encoding (A)</p> Signup and view all the answers

Which statement about prediction is accurate?

<p>It maps input variables to one or more output variables. (B)</p> Signup and view all the answers

How is the performance of a regression model commonly assessed?

<p>By calculating some type of error such as mean error. (C)</p> Signup and view all the answers

What distinguishes supervised learning applications from unsupervised learning?

<p>Supervised learning requires categorized input samples. (B)</p> Signup and view all the answers

When is classification often transformed into a regression problem?

<p>When outputs are ordered categorical variables. (D)</p> Signup and view all the answers

What is the primary goal of the utilization phase in supervised learning?

<p>To generalize knowledge to new, unseen examples (D)</p> Signup and view all the answers

What does generalization in machine learning refer to?

<p>Extracting the essence of data for unseen cases (A)</p> Signup and view all the answers

Which of the following characteristics should a training set possess?

<p>It should cover all regions of the state space sufficiently (A)</p> Signup and view all the answers

During the training phase, how does a system adjust to learn effectively?

<p>It modifies its parameters based on the target output (D)</p> Signup and view all the answers

Why is it important to test a system using a test set after training?

<p>To measure how well the system performs on previously unseen data (D)</p> Signup and view all the answers

What issue can arise from a training set that lacks sufficient instances of certain types?

<p>The model might overfit to the minority class (B)</p> Signup and view all the answers

What does the ability of a model to generalize indicate?

<p>It can adapt to new data beyond the training examples (B)</p> Signup and view all the answers

What is a consequence of a training set being too small?

<p>The model may not be able to solve the problem efficiently (A)</p> Signup and view all the answers

What is the ultimate goal of the training phase in supervised learning?

<p>To modify parameters until output matches target (B)</p> Signup and view all the answers

Why is generalization important in machine learning?

<p>To allow the model to perform well on unseen examples (A)</p> Signup and view all the answers

What is a crucial characteristic of an effective training set?

<p>It should cover various examples across the state space (A)</p> Signup and view all the answers

What happens if the training set is too small?

<p>The system will struggle to generalize (A)</p> Signup and view all the answers

How many binary classifiers are constructed for classifying k class labels?

<p>$k(k-1)/2$ (D)</p> Signup and view all the answers

Why do humans generally find generalization easier than computers?

<p>Humans look for patterns even where none exist (A)</p> Signup and view all the answers

What aspect of a dataset is critical for training a model?

<p>The dataset should consist of tuples (B)</p> Signup and view all the answers

What is an essential step after training a machine learning system?

<p>Validating the model with unseen cases (A)</p> Signup and view all the answers

What is the purpose of one-hot encoding in supervised classification?

<p>To represent multiple classes with binary vectors. (C)</p> Signup and view all the answers

Which function is commonly used in Artificial Neural Networks (ANN) to produce outputs between 0 and 1?

<p>Softmax function (D)</p> Signup and view all the answers

What value do the targets take in a binary classification problem using an ANN?

<p>0 and 1 (D)</p> Signup and view all the answers

How is error calculated in the context of supervised classification using cross-entropy?

<p>By calculating the dot product of the predicted outputs and their logarithm. (C)</p> Signup and view all the answers

What do the outputs of an ANN represent in a multi-class classification scenario?

<p>The probabilities of belonging to each class. (D)</p> Signup and view all the answers

What threshold is typically used in binary classification to determine if an output is positive or negative?

<p>0.5 (C)</p> Signup and view all the answers

For which classification method are the targets often represented as -1 and 1?

<p>Support Vector Machines (B)</p> Signup and view all the answers

What happens to the output of an ANN classification when the input data is highly uncertain?

<p>It outputs a score closer to 0. (A)</p> Signup and view all the answers

What is the primary output of a softmax function?

<p>Probabilities that sum to 1. (D)</p> Signup and view all the answers

What is the significance of having a single target equal to 1 in a one-hot encoded vector?

<p>It indicates the only class that applies to the instance. (B)</p> Signup and view all the answers

Which characteristic describes the sigmoid function in the context of ANN outputs?

<p>It offers continuous outputs between 0 and 1. (B)</p> Signup and view all the answers

What role does the softmax function play in the context of error calculation in multi-class classification?

<p>It normalizes the outputs into a probability distribution. (C)</p> Signup and view all the answers

In error calculation, what does the term H(y,y) represent?

<p>The cross-entropy between true and predicted distributions. (C)</p> Signup and view all the answers

What indicates a high certainty of classification for an output in a binary classification model?

<p>An output tending toward 1. (A)</p> Signup and view all the answers

What is the formula for calculating the information entropy of a dataset?

<p>I = - Σc (nc/n) log2 (nc/n) (C)</p> Signup and view all the answers

What does a higher entropy value indicate about a dataset?

<p>Higher uncertainty about classifications (D)</p> Signup and view all the answers

How is the information gain of an attribute calculated?

<p>G(Ai) = I - I(Ai) (D)</p> Signup and view all the answers

What does the entropy of attribute Ai reflect?

<p>The certainty of classifications after partitioning (C)</p> Signup and view all the answers

Which attribute should be chosen when creating a decision tree based on entropy calculations?

<p>Attribute with the lowest entropy (A)</p> Signup and view all the answers

In a subtable, if the entropy is zero, what does this suggest?

<p>Total certainty about classifications (A)</p> Signup and view all the answers

The entropy of value 1 for the attribute 'Antennas' was calculated as which value for the given dataset?

<p>1 (D)</p> Signup and view all the answers

Which calculation yields the information entropy of value j of attribute Ai?

<p>Iij = - Σ(nij/n) log2(nijc/nij) (B)</p> Signup and view all the answers

What happens when a dataset is split by an attribute with high entropy?

<p>Resulting subsets have high uncertainty (C)</p> Signup and view all the answers

Given the attribute 'Body', which value has higher entropy based on the examples?

<p>Striped (A)</p> Signup and view all the answers

What does the summation in the formula for I(Ai) represent?

<p>Weighted average of the entropies based on their instances (C)</p> Signup and view all the answers

What mathematical operation is used to measure information gain?

<p>Subtraction between two entropy values (D)</p> Signup and view all the answers

If an attribute results in a significant drop in uncertainty, what is its likely consequence in machine learning?

<p>Increased information gain (B)</p> Signup and view all the answers

What is the primary purpose of Lagrange multipliers in the context of transforming the original problem?

<p>To create a dual problem from the original optimization problem. (D)</p> Signup and view all the answers

In the dual problem of Support Vector Machines, what must be minimized with respect to $w$ and $b$?

<p>The dual form of the Lagrangian. (A)</p> Signup and view all the answers

Which of the following correctly defines the KKT conditions applied in Support Vector Machines?

<p>Only the support vectors can contribute to the decision boundary. (A)</p> Signup and view all the answers

Which equation expresses the relationship for $w$ in terms of Lagrange multipliers?

<p>$w = ext{sum of } (eta_i y_i x_i)$ (A)</p> Signup and view all the answers

How is the parameter $b$ calculated from the support vectors?

<p>Calculated individually from any support vector leading to the same value. (C)</p> Signup and view all the answers

What criterion is used to determine which attribute to split on when building a decision tree?

<p>The attribute that maximizes the information gain. (B)</p> Signup and view all the answers

What does a lower entropy value indicate regarding a split made in a decision tree?

<p>Higher certainty and a more homogeneous class distribution. (B)</p> Signup and view all the answers

The entropy of a dataset is maximized under what condition?

<p>When instances are evenly distributed among all classes. (C)</p> Signup and view all the answers

Which of the following is true regarding the information entropy formula?

<p>It measures the average amount of information produced by a stochastic source. (D)</p> Signup and view all the answers

How are support vectors identified in the context of SVM?

<p>By their corresponding $eta_i$ values being greater than zero. (C)</p> Signup and view all the answers

What is the outcome of applying the ID3 algorithm in decision trees?

<p>A recursive partitioning of the dataset based on calculated entropy. (A)</p> Signup and view all the answers

Which equation corresponds to the information entropy for value $j$ of an attribute $A_i$?

<p>$I_{ij} = - rac{n_{ij}}{n} ext{log}<em>2( rac{n</em>{ij}}{n})$ (C)</p> Signup and view all the answers

Which statement best describes the role of the dual problem in SVM?

<p>It transforms the problem into a simpler form that is easier to maximize. (C)</p> Signup and view all the answers

Flashcards

Training

The process of presenting examples to a learning system, allowing it to adjust its internal parameters to better predict the desired output.

Training Set

A collection of examples used during the training phase to teach a machine learning model.

Generalization

The ability of a machine learning model to accurately predict the output for new, unseen examples that differ from those in the training set.

Test Set

The set of data that is not used for training but rather for evaluating how well the trained model performs on unseen examples.

Signup and view all the flashcards

Performance/Accuracy/Error

A measure of how well the model's predictions match the actual target values.

Signup and view all the flashcards

Dataset

A collection of data points, often represented as tuples, used to train a machine learning model.

Signup and view all the flashcards

Model

A mathematical representation that captures the relationship between input features and output predictions.

Signup and view all the flashcards

Generalization Ability

The ability of a machine learning model to perform well on diverse and representative data.

Signup and view all the flashcards

Training Phase

This phase involves presenting examples to the system and allowing it to "learn" from them. The system adjusts its parameters until its output closely matches the desired target output. A performance measure helps gauge how well the system is learning.

Signup and view all the flashcards

Utilization Phase

The system is presented with new, unseen examples and tasked with generalizing its learned knowledge to provide accurate predictions for these unfamiliar inputs

Signup and view all the flashcards

Avoid Memorization

The system should not simply memorize the training data but should learn underlying patterns and relationships so it can make accurate predictions on unseen examples.

Signup and view all the flashcards

Dataset Properties

A group of examples used to train a model. Important factors for choosing a good training set include the size of the set and its representativeness.

Signup and view all the flashcards

Model's Capability

This refers to the ability of the model to make accurate predictions based on a set of input attributes. This is achieved by the system learning from existing data and applying its learnings to unseen data.

Signup and view all the flashcards

What are the goals of supervised learning?

Supervised learning tasks aim to predict outcomes (e.g., classifying images or predicting stock prices) based on known data.

Signup and view all the flashcards

Explain prediction in supervised learning.

Prediction involves finding a function that maps input variables to output variables. It's like figuring out the equation that relates coffee consumption to energy levels.

Signup and view all the flashcards

What is curve fitting in supervised learning?

Curve fitting is about finding a line or curve that best represents a set of data points. It's like drawing a line through a scatter plot.

Signup and view all the flashcards

Describe classification in supervised learning.

Classification tasks separate data into distinct categories. For example, sorting emails into spam or not spam.

Signup and view all the flashcards

Explain regression in supervised learning.

Regression aims to model relationships between two sets of data. It's like finding a formula to describe the link between temperature and ice cream sales.

Signup and view all the flashcards

How are model outputs encoded?

Encoding outputs for models can be done as real numbers or boolean values. Using real numbers implies an order between classes (e.g., low/medium/high product quality). Boolean values are more common, especially for multiple classes.

Signup and view all the flashcards

What is one-hot encoding?

One-hot encoding represents multiple classes using binary values. For example, class A is represented as [1, 0, 0], class B as [0, 1, 0], and class C as [0, 0, 1].

Signup and view all the flashcards

How are two classes represented?

For two classes, a binary encoding is used (e.g., A/-A, -/+, A/B). This simplifies the representation.

Signup and view all the flashcards

Information Entropy

A measure of uncertainty in a dataset based on the distribution of samples across different classes.

Signup and view all the flashcards

Information Entropy of a Value (Iij)

A measure of uncertainty associated with a specific value of an attribute.

Signup and view all the flashcards

Information Entropy of an Attribute (I(Ai))

A measure of uncertainty associated with an attribute as a whole, calculated by averaging the entropies of each value within the attribute.

Signup and view all the flashcards

Information Gain (G(Ai))

The difference between the overall information entropy of a dataset and the information entropy of a specific attribute. It indicates how much the entropy of the dataset decreases by splitting based on that attribute.

Signup and view all the flashcards

ID3 Algorithm

A technique for building decision trees that recursively chooses the attribute with the highest information gain at each node. It works by partitioning the data based on attributes that best separate classes.

Signup and view all the flashcards

Attribute

A feature or characteristic of an example within a dataset. In the context of ID3, attributes are used to make decisions and guide tree construction.

Signup and view all the flashcards

Value

A specific value that an attribute can take. For example, the attribute 'Body' can have values like 'White' or 'Striped'.

Signup and view all the flashcards

Class

The output or prediction that a decision tree makes based on the input values of the example.

Signup and view all the flashcards

Decision Tree

A branching tree-like structure that represents a set of rules for classifying examples based on their attribute values. Each node within the tree corresponds to a decision based on an attribute, and each branch represents a possible value of that attribute.

Signup and view all the flashcards

Splitting

The process of splitting the dataset into subsets based on the values of an attribute, creating sub-tables for each value of the attribute. This is done to measure entropy for each value.

Signup and view all the flashcards

Subtable

A sub-table of a dataset, created by splitting the original table based on a particular attribute.

Signup and view all the flashcards

Weighted Average Entropy

The process of calculating the information entropy of each value of the attribute, then averaging those entropies to obtain the weighted average entropy for the entire attribute. This helps determine the overall uncertainty associated with the attribute.

Signup and view all the flashcards

Splitting Attribute

The attribute chosen at each node of the decision tree that gives the highest information gain. This indicates that the attribute is the most informative for minimizing uncertainty and making a good classification decision.

Signup and view all the flashcards

Recursive Attribute Selection

The process of iteratively choosing the attribute with the highest information gain at each node of the decision tree. It continues until either all examples belong to the same class or there are no more attributes to split on.

Signup and view all the flashcards

One-Hot Encoding

In supervised classification, the output is encoded as boolean values (0 or 1) to represent different classes. When more than two classes are involved, one-hot encoding is used, where each class is represented by a separate output. Each output takes the value 1 if the instance belongs to that class, or 0 otherwise.

Signup and view all the flashcards

Softmax Function

The softmax function is a popular activation function used in artificial neural networks for multi-class classification. It takes an array of real numbers as input and outputs a probability distribution over all classes. The sum of all probabilities is always equal to 1. The output with the highest probability represents the predicted class.

Signup and view all the flashcards

Cross-Entropy Loss

Cross-entropy is a commonly used loss function in classification models. It measures the difference between the predicted probability distribution (output of the model) and the true probability distribution (target values). The goal is to minimize the cross-entropy, which means the predicted distribution should be as close as possible to the true distribution.

Signup and view all the flashcards

Cross-Entropy Loss Calculation

The cross-entropy loss is calculated by averaging the log-loss across all examples. The log-loss for a single example is calculated as the dot product of the target vector and the logarithm of the output vector. Since only one target is 1 (the rest are 0), only the output corresponding to the positive target is evaluated.

Signup and view all the flashcards

Two-Class Output Encoding

When dealing with two classes, the output encoding can be simplified to using 0/1 or -1/1 values to represent the two classes. The choice of encoding depends on the specific classifier. For example, ANNs typically use 0/1 while SVMs use -1/1.

Signup and view all the flashcards

Confidence Score in Binary Classification

In binary classification, the model typically returns real numbers as output, representing the probability of belonging to the positive class. This gives information on the certainty of the classification, beyond just a simple categorical prediction. The output can be interpreted as a confidence score, where higher values indicate greater certainty.

Signup and view all the flashcards

Sigmoid Function

The sigmoid function is a common activation function used in binary classification tasks. It takes a real number as input and outputs a value between 0 and 1. This value can be interpreted as a probability of belonging to the positive class. High values (closer to 1) indicate a higher probability, while low values (closer to 0) indicate a lower probability.

Signup and view all the flashcards

Sigmoid Output Interpretation

The output of a sigmoid function can be interpreted as a confidence score, where a value close to 1 indicates high confidence in the positive class, and a value close to 0 indicates high confidence in the negative class. A threshold of 0.5 is commonly used to determine the class based on the sigmoid output.

Signup and view all the flashcards

SVM Output Interpretation

In Support Vector Machines (SVMs), the targets are encoded as -1/1 for binary classification. The output of the SVM can be thought of as a distance from the decision boundary. A higher positive output indicates a high certainty of being in the positive class, and a higher negative output indicates a high certainty of being in the negative class.

Signup and view all the flashcards

Output as Confidence Score

The output of a classifier can be interpreted as a confidence score, which reflects the certainty of the classification. This score can be helpful in understanding the reliability of the model's prediction. For example, a prediction with a high confidence score might be more trustworthy than a prediction with a low confidence score.

Signup and view all the flashcards

Supervised Learning

Supervised learning involves training a model on a labeled dataset, where each data point has a corresponding target value. The model learns to map inputs to outputs based on this labeled information. The goal is to generalize the learned patterns to unseen data.

Signup and view all the flashcards

Classification

Classification is a type of supervised learning where the goal is to predict the class label of an input instance. The model learns to categorize data points into predefined classes based on the features provided.

Signup and view all the flashcards

Artificial Neural Networks (ANNs)

Artificial Neural Networks (ANNs) are biologically inspired computational models that can learn complex patterns from data. They consist of interconnected nodes (neurons) organized in layers. Each neuron receives input from the previous layer, applies a transformation, and outputs a value to the next layer. The weights between the neurons adjust during training to achieve optimal performance.

Signup and view all the flashcards

Model Training

Training a model involves adjusting its parameters to minimize the difference between its predictions and the actual target values. This process is often iterative, where the model is repeatedly exposed to the training data, and its parameters are updated based on the observed errors.

Signup and view all the flashcards

Original Problem (SVMs)

The original problem in Support Vector Machines (SVMs) involves minimizing the norm of the weight vector (w) subject to constraints that ensure proper classification. This original problem is often difficult to solve directly.

Signup and view all the flashcards

Dual Problem (SVMs)

In SVMs, the dual problem is a reformulation of the original problem that is easier to solve. It involves maximizing a function called the dual Lagrangian (LD) with respect to Lagrange multipliers (α).

Signup and view all the flashcards

Lagrange Multipliers (αi)

Lagrange multipliers (αi) are associated with each constraint in the original problem. These multipliers play a crucial role in transforming the problem into its dual form.

Signup and view all the flashcards

Dual Lagrangian (LD)

The dual Lagrangian (LD) is a function that is maximized in the dual problem. It depends on the Lagrange multipliers (α) and the data points.

Signup and view all the flashcards

Optimization in Dual Problem (SVMs)

The dual problem is optimized with respect to Lagrange multipliers (αi), while the original problem is optimized with respect to the weight vector (w) and bias (b).

Signup and view all the flashcards

Constraints in Dual Problem (SVMs)

The dual problem involves maximizing the dual Lagrangian (LD) subject to constraints on the Lagrange multipliers (αi). These constraints ensure that the multipliers are non-negative.

Signup and view all the flashcards

Support Vectors (SVMs)

Support vectors are those data points that lie closest to the decision boundary and have a non-zero Lagrange multiplier (αi). They play a crucial role in determining the separating hyperplane in SVMs.

Signup and view all the flashcards

Calculating Weight Vector (w) (SVMs)

In SVMs, the weight vector (w) can be calculated using the Lagrange multipliers (αi) and the support vectors. This calculation is performed using a summation over all support vectors.

Signup and view all the flashcards

Calculating Bias (b) (SVMs)

The bias (b) in SVMs is a constant term that shifts the decision boundary. It is calculated using a support vector and the weight vector.

Signup and view all the flashcards

Classification Function (SVMs)

The classification function in SVMs is a function that takes a new data point (x) and predicts its class label based on the weight vector (w), the bias (b), and the support vectors. It does not require explicitly calculating the weight vector (w).

Signup and view all the flashcards

Karush-Kuhn-Tucker (KKT) Conditions (SVMs)

The Karush-Kuhn-Tucker (KKT) conditions are necessary conditions for optimality in constrained optimization problems. In SVMs, they help convert inequality constraints into equality constraints, providing insights into the characteristics of support vectors.

Signup and view all the flashcards

KKT Conditions and Support Vectors (SVMs)

In SVMs, the KKT conditions imply that data points that are not support vectors have a zero Lagrange multiplier (αi). This means that these points do not influence the decision boundary.

Signup and view all the flashcards

ID3 Decision Tree Algorithm

The ID3 Decision Tree algorithm aims to build a decision tree by recursively partitioning the data based on attributes that minimize the entropy of the resulting sub-trees. This process aims to create a tree with lower entropy, meaning higher certainty in the classification of instances.

Signup and view all the flashcards

Entropy (Decision Trees)

Entropy is a measure of uncertainty or impurity in a dataset. In the context of decision trees, lower entropy indicates a dataset with more certainty about the class labels of its instances.

Signup and view all the flashcards

Information Entropy of Dataset

The information entropy of a dataset is calculated as the weighted sum of the logarithms of the probabilities of each class appearing in the dataset.

Signup and view all the flashcards

Study Notes

Machine Learning II - Unit 1: Supervised Learning

  • Supervised learning involves two phases: training and utilization.
  • Training: Examples are presented to the system (training set). The system learns from the examples and gradually modifies adjustable parameters until the output matches the desired output (target). A measure of performance (accuracy/error) is needed.
  • Utilization: New examples (never seen before) are presented to the system. The system generalizes based on the learned patterns in the training set. A test set is necessary.

Generalization

  • Memorization is not the goal, generalization is.
  • Generalization is easier for humans than for a computer.
  • Humans recognize patterns even where they do not exist, known as pareidolia.
  • The system needs to extract the essence and structure of the data, not just the correct answer for some cases.
  • Testing on new instances is crucial to evaluate generalization ability beyond the training instances.

Datasets

  • Datasets are made up of tuples <attributes, value>.
  • The training set represents the data used for learning.
  • The model should generalize to new, unseen data.

Training Sets

  • Meaningful datasets are important for accurate learning; insufficient examples will ineffectively train a model.
  • Representative datasets cover all possible regions of the state space. A good training set has examples of varied instances to avoid the model specializing too much in a particular subset.

Model Well Chosen

  • Adequately chosen model complexity and good training set are critical to a model with good generalization ability.
  • The model should be robust on new instances.
  • Consider cases where there is no corresponding data in a given region of data space.
  • An example would be predicting traits of a camel based on data from only one type of camel (e.g., 2-humped).

Parameters vs. Hyperparameters

  • Parameters are internal to the model and their values are set during the training, while hyperparameters' values are set by the user.
  • Examples of models, parameters, and hyperparameters are given (Polynomial, ANN, SVM)

Supervised Learning Problems

  • Prediction
  • Curve fitting
  • Classification
  • Regression,

Supervised Classification: Output Encoding (For Binary)

  • As boolean values (most common) are used (0/1, -1/1 etc.)
  • As real numbers (less common). Only if order exists between classes Classification problem becomes a regression problem

Supervised Classification: Output Encoding (For Multi-class)

  • One-hot encoding is used.
  • One output per class is used.
  • Example: if an instance belongs to class A, the output for class A is 1 and other outputs are 0.
  • ANN: The softmax function is applied to the outputs of the last layer to generate real outputs ∈ [0,1] whose sum is equal to 1.
  • Final output: choosing the most probable class. The same principles for output encoding hold for binary classification.

Supervised Classification: Error Calculation

  • Binary cross entropy
  • Mean Error (ME), Mean Squared Error (MSE), etc • Difference between output and target

Multi-class Classification

  • One-vs-all or one-vs-rest
  • One-vs-one
  • Other models may vary.

Support Vector Machines (SVM)

  • Linearly separable problems
  • Classifying into two classes
  • Extendable to multiple regression problems

Linear Separable Problems

  • Idea: Maximize the margin between the separating hyperplane and the closest data points to either class.

Nonlinear Separable Problems

  • The goal is to maximize the margin, even if it includes error.
  • Additional variables need to be added for correct classification.

Kernel Trick

  • Transformation of data mapping to a higher-dimensional space to facilitate better classifications.
  • Kernel function is often used.
  • Many kernel functions are available (linear, polynomial, radial basis, etc.)

Decision Trees (ID3)

  • Representation of decision-making processes.
  • Items: leaves (describing classes) and nodes (asking questions about specific attributes). Relationships are expressed by tree branches.
  • Example: tennis game classification given environmental factors (e.g., outlook, humidity, wind).
  • A new instance is classified by following the tree from the root to a leaf.
  • Rules can be generated based on the tree.

Decision Trees (C4.5 and CART)

  • Improved algorithms for handling continuous attributes(e.g.temperature), multiple data errors
  • Attribute selection using information gain and gain ratio which reduces uncertainty.
  • Overfitting prevention is done through methods such as pre-pruning and post-pruning.
  • CART differs in using the Gini Index as a criterion for attribute selection.

Decision Trees (MARS)

  • Multivariate adaptive regression splines (MARS) is an extension of CART to handle multivariate functions and high-dimensional data.
  • Basis functions are chosen as a product of spline functions.

Regression Trees

  • A special type of decision trees that handle continuous outputs.
  • Predictions are the average of the target values of instances that reach a node (a leaf), representing the value of that variable in a given region.
  • CART (analogously to Classification and Regression trees) and related techniques are typically used for training regression trees.

k-Nearest Neighbors (k-NN)

  • Uses similarity based on distance. The most common value among k nearest neighbours is used to classify an instance.
  • Euclidean distance or similar metrics
  • The parameter k is important in determining performance and avoiding mistakes by excessively generalizing.
  • Used and adjusted for both classification and regression problems.

Hybrid Local Models

  • Combination of two or more classification or regression methods.
  • Local Model: An approach where the model is built using just a small portion of the data for each partition.
  • Models are built recursively to minimize effects of noise or complex data and provide greater generalization.

Case-Based Reasoning (CBR)

  • Uses previously solved cases to solve new, similar problems.
  • Adapts previous solutions for current cases based on similarity.
  • Components:
    • Retrieve similar cases.
    • Reuse solved solutions.
    • Revise and adapt solutions.
    • Retain improved solution for future use

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Test your knowledge on supervised learning and classification with this quiz. Explore key concepts such as regression analysis, performance assessment, and the distinctions between supervised and unsupervised learning. Perfect for students and professionals looking to reinforce their understanding of machine learning techniques.

More Like This

Use Quizgecko on...
Browser
Browser