Support Vector Machine (SVM) PDF

Summary

These notes cover the Support Vector Machine (SVM) algorithm. The document details the introduction, maximum margin hyperplane, linear SVM, non-linear SVM, soft-margin SVM concepts and calculations. The SVM algorithm serves as a method for classification in machine learning.

Full Transcript

Support Vector Machine Contents to be covered Introduction to SVM Concept of maximum margin hyperplane Linear SVM 1. Calculation of MMH 2. Learning a linear SVM 3. Classifying a test sample using linear SVM 4. Classifying multi-class data Non-linear SVM 1. Concept of non-linear data 2. Soft...

Support Vector Machine Contents to be covered Introduction to SVM Concept of maximum margin hyperplane Linear SVM 1. Calculation of MMH 2. Learning a linear SVM 3. Classifying a test sample using linear SVM 4. Classifying multi-class data Non-linear SVM 1. Concept of non-linear data 2. Soft-margin SVM 3. Kernel Trick Introduction to SVM Introduction A classification that has received considerable attention is support vector machine and popularly abbreviated as SVM. This technique has its roots in statistical learning theory (Vlamidir Vapnik, 1992). As a task of classification, it searches for optimal hyperplane(i.e., decision boundary, see Fig. 1 in the next slide) separating the tuples of one class from another. SVM works well with higher dimensional data and thus avoids dimensionality problem. Although the SVM based classification (i.e., training time) is extremely slow, the result, is however highly accurate. Further, testing an unknown data is very fast. SVM is less prone to over fitting than other methods. It also facilitates compact model for classification. Introduction In this lecture, we shall discuss the following. Maximum margin hyperplane : a key concept in SVM. Linear SVM : a classification technique when training data are linearly separable. Non-linear SVM : a classification technique when training data are linearly non- separable. Maximum Margin Hyperplane Maximum Margin Hyperplane Maximum Margin Hyperplane Maximum Margin Hyperplane contd... Figure 2 shows a plot of data in 2-D. Another simplistic assumption here is that the data is linearly separable, that is, we can find a hyperplane (in this case, it is a straight line) such that all +’s reside on one side whereas all -’s reside on other side of the hyperplane. From Fig. 2, it can be seen that there are an infinite number of separating lines that can be drawn. Therefore, the following two questions arise: 1. Whether all hyperplanes are equivalent so far the classification of data is concerned? 2. If not, which hyperplane is the best? Debasis Maximum Margin Hyperplane contd... We may note that so far the classification error is concerned (with training data), all of them are with zero error. However, there is no guarantee that all hyperplanes perform equally well on unseen (i.e., test) data. Maximum Margin Hyperplane contd... Thus, for a good classifier it must choose one of the infinite number of hyperplanes, so that it performs better not only on training data but as well as test data. To illustrate how the different choices of hyperplane influence the classification error, consider any arbitrary two hyperplanes H1 and H2 as shown in Fig. 3. Maximum Margin Hyperplane contd... Maximum Margin Hyperplane contd... In Fig. 3, two hyperplanes H1 and H2 have their own boundaries called decision boundaries(denoted as b11 and b12 for H1 and b21 and b22 for H2). A decision boundary is a boundary which is parallel to hyperplane and touches the closest class in one side of the hyperplane. The distance between the two decision boundaries of a hyperplane is called the margin. So, if data is classified using Hyperplane H1, then it is with larger margin then using Hyperplane H2. The margin of hyperplane implies the error in classifier. In other words, the larger the margin, lower is the classification error. Maximum Margin Hyperplane contd... Intuitively, the classifier that contains hyperplane with a small margin are more susceptible to model over fitting and tend to classify with weak confidence on unseen data. Thus during the training or learning phase, the approach would be to search for the hyperplane with maximum margin. Such a hyperplane is called maximum margin hyperplane and abbreviated as MMH. We may note the shortest distance from a hyperplane to one of its decision boundary is equal to the shortest distance from the hyperplane to the decision boundary at its other side. Alternatively, hyperplane is at the middle of its decision boundaries. Linear SVM Linear SVM A SVM which is used to classify data which are linearly separable is called linear SVM. In other words, a linear SVM searches for a hyperplane with the maximum margin. This is why a linear SVM is often termed as a maximal margin classifier (MMC). Finding MMH for a Linear SVM Finding MMH for a Linear SVM Equation of a hyperplane in 2-D Equation of a hyperplane in 2-D Equation of a hyperplane Finding a hyperplane Finding a hyperplane Finding a hyperplane Finding a hyperplane Hyperplane and Classification Calculating Margin of a Hyperplane Calculating Margin of a Hyperplane Calculating Margin of a Hyperplane Calculating Margin of a Hyperplane Calculating Margin of a Hyperplane Calculating Margin of a Hyperplane Calculating Margin of a Hyperplane Learning for a Linear SVM Searching for MMH Searching for MMH Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method Lagrange Multiplier Method LMM to Solve Linear SVM LMM to Solve Linear SVM LMM to Solve Linear SVM Classifying a test sample using linear SVM Classifying a test sample using linear SVM Classifying a test sample using linear SVM Illustration: Linear SVM Illustration: Linear SVM Illustration: Linear SVM Illustration: Linear SVM Illustration: Linear SVM Classification of Multiple-class Data Classification of Multiple-class Data OVO Strategy Classification of Multiple-class Data OVO Strategy Classification of Multiple-class Data OVO Strategy Classification of Multiple-class Data OVA Strategy Classification of Multiple-class Data OVA Strategy Classification of Multiple-class Data OVA Strategy Classification of Multiple-class Data OVA Strategy Classification of Multiple-class Data OVA Strategy Classification of Multiple-class Data OVA Strategy Non-Linear SVM Non-Linear SVM Linear and Non-Linear Separable Data SVM classification for non separable data Linear SVM for Linearly Not Separable Data Linear SVM for Linearly Not Separable Data Linear SVM for Linearly Not Separable Data Linear SVM for Linearly Not Separable Data Soft Margin SVM Soft Margin SVM Soft Margin SVM: Interpretation of ξ Soft Margin SVM: Interpretation of ξ Soft Margin SVM: Interpretation of ξ Soft Margin SVM: Interpretation of ξ Soft Margin SVM: Interpretation of ξ Solving for Soft Margin SVM Solving for Soft Margin SVM Solving for Soft Margin SVM Non-Linear SVM Non-Linear SVM Non-Linear SVM Non-Linear SVM Concept of Non-Linear Mapping Concept of Non-Linear Mapping Concept of Non-Linear Mapping Concept of Non-Linear Mapping Concept of Non-Linear Mapping Concept of Non-Linear Mapping Non-Linear to Linear Transformation: Issues Non-Linear to Linear Transformation: Issues Non-Linear to Linear Transformation: Issues Dual Formulation of Optimization Problem Dual Formulation of Optimization Problem Dual Formulation of Optimization Problem

Use Quizgecko on...
Browser
Browser