Support Vector Classifier Quiz
25 Questions
3 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is a drawback of the maximal margin classifier?

  • It perfectly classifies all training observations.
  • It may have overfit the training data. (correct)
  • It identifies support vectors effectively.
  • It is insensitive to individual observations.
  • The support vector classifier aims to perfectly separate the two classes.

    False

    What is the primary role of the hyperplane in the support vector classifier?

  • To minimize the width of the margin.
  • To classify observations without any misclassification.
  • To separate the training observations into two classes. (correct)
  • To increase the number of observations on the correct side of the margin.
  • What are observations that lie directly on the margin or on the wrong side of the margin for their class called?

    <p>Support vectors</p> Signup and view all the answers

    If a slack variable $\epsilon_i$ is greater than 1, it indicates that the observation is on the wrong side of the margin.

    <p>False</p> Signup and view all the answers

    In a support vector classifier, changing the position of an observation that lies strictly on the correct side of the margin will ___ the classifier.

    <p>not change</p> Signup and view all the answers

    What happens to the margin of a support vector classifier as the regularization parameter C increases?

    <p>The margin widens.</p> Signup and view all the answers

    A small C value leads to a classifier with high bias and low variance.

    <p>True</p> Signup and view all the answers

    What does the acronym SVM stand for?

    <p>Support Vector Machine</p> Signup and view all the answers

    The maximal margin classifier is the most complex form of SVM.

    <p>False</p> Signup and view all the answers

    What is the purpose of a hyperplane in SVM?

    <p>To separate different classes in feature space.</p> Signup and view all the answers

    The vector β in the hyperplane equation β0 + β1 X1 + β2 X2 +...+ βp Xp = 0 is known as the ______.

    <p>normal vector</p> Signup and view all the answers

    What method is used in SVM when there are more than 2 classes?

    <p>One versus All (OVA)</p> Signup and view all the answers

    Support Vector Machine (SVM) is more effective than Logistic Regression (LR) when classes are not separable.

    <p>False</p> Signup and view all the answers

    What is the loss function used in support vector classifier optimization?

    <p>Hinge loss</p> Signup and view all the answers

    When $y_i(\beta_0 + \beta_1x_{i1} +...+ \beta_px_{ip})$ is greater than 1, the SVM loss is ______.

    <p>zero</p> Signup and view all the answers

    Match the following concepts with their descriptions:

    <p>SVM = Works well for nearly separable classes Logistic Regression = Estimates probabilities One versus All (OVA) = Fit K 2-class SVM classifiers One versus One (OVO) = Fit all pairwise classifiers</p> Signup and view all the answers

    What characterizes a support vector machine compared to a support vector classifier?

    <p>It can combine with a non-linear kernel.</p> Signup and view all the answers

    The radial kernel has a global behavior, meaning all training observations affect the predicted class label for a test observation.

    <p>False</p> Signup and view all the answers

    What is the role of the parameter gamma (𝛾) in radial basis kernel?

    <p>It controls the fit of the model, affecting the non-linearity.</p> Signup and view all the answers

    Support vector machines utilize kernels to compute the __________ needed for different dimensions.

    <p>inner-products</p> Signup and view all the answers

    Match the kernel types with their characteristics:

    <p>Linear Kernel = Linear in features Polynomial Kernel = Uses degree d for transformations Radial Kernel = High-dimensional implicit feature space Kernels in SVM = Computes pairs without enlarged space</p> Signup and view all the answers

    Which of the following best describes the polynomial kernel?

    <p>It computes inner products for transformations of degree d.</p> Signup and view all the answers

    As the distance between a test observation and a training observation increases, the contribution of that training observation to the prediction increases.

    <p>False</p> Signup and view all the answers

    What happens to the predicted class label when the training observations are far from the test observation?

    <p>They have virtually no role in determining the predicted class label.</p> Signup and view all the answers

    Study Notes

    Introduction to Machine Learning - AI 305: Support Vector Machines (SVM)

    • Support Vector Machines (SVMs) are a classification approach developed in the 1990s, gaining popularity since.
    • SVMs perform well in various settings and are considered strong "out-of-the-box" classifiers.
    • The core concept is the maximal margin classifier.
    • The support vector classifier extends the maximal margin classifier for broader datasets.
    • Support Vector Machines (SVM) extend the support vector classifier further to accommodate non-linear class boundaries.

    Contents

    • Maximal Margin Classifier
    • Support Vector Classifier
    • Support Vector Machine
    • SVM for Multiclass Problems
    • SVM vs. Logistic Regression

    Introduction - Continued

    • Support Vector Machines (SVMs) are an approach for classification, originally developed in the computer science community during the 1990s.
    • The popularity has grown since then.
    • These approaches perform well across a range of contexts, frequently being regarded as one of the best "off-the-shelf" or pre-built classifiers.
    • The approach handles two-class classification problems directly.
    • Trying to find a plane that cleanly segregates the classes in feature space is the first step.
    • If a separating plane can't be readily identified, two strategies are employed : -Refining the meaning of "separates" -Expanding and elaborating the feature space to enable separation.

    What is a Hyperplane?

    • A hyperplane in p-dimensions is an affine subspace of dimension p−1.
    • The generic equation for a hyperplane is: 60 + 61X1 + 62X2 + ... + 6pXp = 0
    • In two dimensions, a hyperplane is a line.
    • In three dimensions, it's a plane.
    • If 60 = 0, the hyperplane goes through the origin. Otherwise, it does not.
    • The vector 6 = (61, 62, ..., 6p) is deemed the normal vector, pointing orthogonal the hyperplane's surface.

    Hyperplanes - Example

    • Let the hyperplane be represented as: 1 + 2X1 + 3X2 = 0.
    • The blue region represents the points where 1 +2X1 + 3X2 > 0.
    • The purple region represents the points where 1 + 2X1 + 3X2 < 0.

    Classification using a Separating Hyperplane

    • Given a nxp dataset X of n training observations in p-dimensional space, where these observations fall into two categories (y1,..., yn ∈ {-1, +1}).
    • The objective is to develop a classifier to categorize the test observation based on its feature measurements.
    • A variety of techniques are used (logistic regression, classification trees, bagging, boosting).
    • This approach introduces a novel method based on a separating hyperplane concept.

    Separating Hyperplanes

    • If f(X) = 60 + 61X1 + ... + 6pXp, f(x) > 0 for points on one side of the hyperplane; f(x) < 0 on the other side.
    • If y₁ = +1 for blue and y₁ = -1 for purple, then y₁f(x₁) > 0 for all i.
    • f(x) = 0 defines a separating hyperplane.

    Maximal Margin Classifier

    • Among all separating hyperplanes, it seeks the one maximizing the gap (margin) between the two classes.
    • The maximal margin hyperplane is the solution of the optimization problem that minimizes ‖β‖2 subject to a set of constraints.
    • The constraints enforce that each observation must fall on the correct side of the hyperplane and maintain a distance at least M from it, with M being the margin width.
    • This formulation can be resolved effectively as a convex quadratic program.

    Non-separable Data

    • Data that cannot be separated by a linear boundary using the specified criterion.
    • There's no solution with a margin larger than zero, often the case unless the number of observations (N) is less than the dimensionality (p).
    • The generalization of the maximal margin classifier, accommodating non-separable cases is called a support vector classifier, employing a "soft margin".

    Noisy Data

    • Data that is separable but includes noise, potentially leading to a less desirable solution for maximal-margin classifiers.
    • For this case the support vector classifier maximizes a soft margin.

    Drawbacks of Maximal Margin Classifiers

    • Classifiers based on separating hyperplanes invariably perfectly classify all training observations, leading to increased sensitivity towards individual observations.
    • The addition of a single new observation can dramatically alter the maximal margin hyperplane.
    • The resulting hyperplane with a narrow margin is often undesirable, making it problematic because its small distance between observations and the hyperplane lowers confidence that the observation was correctly categorized.

    Support Vector Classifiers

    • Given the limitations of the maximal margin classifier, support vector classifiers (called soft margin classifiers) are introduced to tolerate misclassifications of a few observations in order to perform better for the remaining data points.
    • They use less restrictive conditions on hyperplane selection, aiming to improve overall classification accuracy.

    Support Vector Classifier - Continued

    • The optimization problem is structured in such a way that only observations on or violating the margin affect the hyperplane.
    • Points that lie directly on the margin, or on the "wrong" side are considered "support vectors" and control the margin boundaries.
    • These “support vectors” significantly influence the SVM classifier.

    Support Vector Classifier- Continued

    • Example illustrating how support vector classifiers fit to a small dataset with dashed margins indicate the fitted hyperplanes
    • Illustrates how data points on or violating the margin affect the hyperplane position in the plots. Some points in the sample dataset are close to the margin (support vectors).

    Details of the Support Vector Classifier

    • SVM classifiers are based on the side of a hyperplane on which a test observation falls.
    • The hyperplane is carefully selected to correctly categorize the majority of training observations while tolerating a few possible misclassifications.
    • The solution rests on an optimization problem.
    • The problem uses a parameter C and the width of the margins M (inverse of the norm of its weight vector) and slack variables to enable some observations to be on the wrong side of the margin.

    Details of the Support Vector Classifier - Continued

    • C is a non-negative model tuning parameter.
    • M as related to maximizing margin width.
    • Slack variables allow individual observations to be on the wrong side of the margin or hyperplane.

    Slack Variable

    • Slack variable єi reflects the position of the ith observation relative to the margin and hyperplane.
    • єi = 0 indicates the ith observation is on the correct side of the margin.
    • єi > 0 suggests the ith observation is on the incorrect side of the margin (in violation); єi > 1 implies the ith obs. is on the incorrect side of the hyperplane.

    Regularization Parameter C

    • C limits the total amount of violations made to the margin or hyperplane.
    • It acts as a constraint against a high number of misclassifications on training data.
    • C=0 indicates a strict adherence to the margin (no violations allowed).
    • Higher C leads to a wider margin and a tendency to tolerate more margin violations, which impacts confidence levels on observations' categorization. If more than C observations deviate from the margin or hyperplane, adjustments may be needed.

    The Regularization Parameter C - Continued

    • Analyzing the effect of C on the support vector classifier's performance shows how varying C impacts the margin width and the number of support vectors.
    • In high C cases, almost all the training observations will influence the hyperplane, potentially creating a low-bias and high-variance classifier; conversely, small C means the hyperplane is determined by few observations, resulting in low-variance and high-bias classifiers.

    Robustness of Support Vector Classifiers

    • Support vector classifier decision rules predominantly rely on a restricted (potentially smaller subset of training observations). These observations are known as support vectors.
    • Decision robustness is elevated because of this reliance on support vectors, reducing susceptibility to distant outlier impacts.
    • Note the contrast to other classification approaches (for example, linear discriminant analysis).

    Linear Boundary Failures

    • Linear boundaries may fail to separate observations in some cases regardless of C values
    • Data patterns requiring non-linear decision boundaries could also be solved by employing non-linear transformations in the original feature space.

    Feature Expansion

    • Feature space is enlarged by introducing polynomial or other transformations.
    • The support vector machine in this enlarged dimensional space may find a separating hyperplane that produces a non-linear decision boundary in the original input space (i.e. using quadratic, cubic, higher order-polynomial expansions).
    • The optimization problem will be altered to reflect the higher dimensionality space.

    Feature Expansion - Example

    • This example demonstrates how enlarging feature space with specific transformations can produce a non-linear decision boundary.
    • Illustrating practical application.

    Cubic Polynomials

    • Illustrates cubic polynomials basis expansion from 2 to 9 variables
    • Applying this transformation to a specific dataset (plotted sample) yields a support vector classifier solution to the non-linear separation problem.

    SVMs: More Than Two Classes

    • Classic Support Vector Machine implementations work for only two classes; this section discusses multi-class expansions.
    • The "one-versus-all" (OVA) approach fits individual classifiers (one vs all other classes) resulting K classifiers.
    • The class assignment is determined based on the maximum value amongst all these classifiers for a given observation.
    • The "one-versus-one" (OVO) approach fits all pairwise combinations yielding K(K−1)/2 classifiers; the class with the most winning pairwise competitions is chosen for the input example.

    SVM vs. Logistic Regression

    • The optimization problem in SVMs can be rephrased using a "hinge" loss function that closely resembles the "loss" function used in logistic regression (negative log-likelihood).
    • The loss functions of both approaches have notable similarities in their respective shapes.

    Which to Use: SVM or Logistic Regression?

    • SVMs outperform logistic regression when the classes are clearly separable and a linear boundary can readily be identified.
    • In cases where the classes are not well-segmented, logistic regression with a regularisation penalty or support vector techniques generally yield similar outcomes.
    • When estimating probabilities, logistic regression is the more appropriate choice.
    • In cases where non-linear boundaries or high dimensionality are required, kernel SVMs may be prioritized due to their adaptability; however, they typically require more computations.

    End

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    Test your knowledge on support vector classifiers and their components. This quiz covers topics like maximal margin classifiers, hyperplanes, slack variables, and observations in relation to the margin. Challenge yourself with these essential concepts in machine learning.

    More Like This

    Use Quizgecko on...
    Browser
    Browser