Pattern Recognition Lecture 5: Classification III

Study Notes

Classification III

Multi-class classification can be achieved by combining a number of binary classifiers
Two common approaches to multi-class SVMs: One vs. all and One vs. one

One vs. All

Train an SVM for each class vs. the rest
Testing: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value
Advantages: number of binary classifiers equals the number of classes
Disadvantages: during training, training sample sizes are unbalanced

One vs. One

Train an SVM for each pair of classes
Testing: each learned SVM “votes” for a class to assign to the test example
Advantages: training data required for each class is balanced
Disadvantages: number of classifiers are n(n-1)/2, so they increase as the number of classes increase (higher computational cost)

SVMs: Pros and Cons

Pros: many publicly available SVM packages, kernel-based framework is very powerful and flexible, work well in practice, even with very small training sample sizes
Cons: no “direct” multi-class SVM, must combine two-class SVMs, can be tricky to select best kernel function for a problem, computational and memory issues during training time

Logistic Regression vs. SVMs

If number of features is large (relative to number of training examples), use logistic regression, or SVM without a kernel (“linear kernel”)
If number of features is small, and number of training examples is intermediate, use SVM with Gaussian kernel
If number of features is small, and number of training examples is large, create/add more features, then use logistic regression or SVM without a kernel

Decision Trees

Intuition: what makes a loan risky? (credit history, income, loan terms, personal information)
Decision tree learning task: learn decision tree from data, find the best tree that represents the data
Problem: exponentially large number of possible trees makes decision tree learning hard (NP-hard problem)

Decision Tree Learning

Training data: N observations (xi, yi)
Quality metric: classification error, error measures fraction of mistakes, best possible value: 0.0, worst possible value: 1.0
Find the tree with lowest classification error

Simple Greedy Algorithm Decision Tree Learning

Step 1: Start with an empty tree, start with the data
Step 2: Select a feature to split data, split on a feature
Step 3: Making predictions, if nothing more to do, make predictions
Step 4: Recursion, otherwise, go to Step 2 & continue (recurse) on this split

Problems in Decision Tree Learning

Problem 1: Feature split selection
Problem 2: Stopping condition

Description

This quiz covers the concepts of multi-class classification, decision trees, and Support Vector Machines (SVM) in pattern recognition. It includes topics such as learning a decision tree, feature selection criteria, and achieving multi-class classification by combining binary classifiers.