Linear SVM Classification

Study Notes

Linear SVM Classification

SVMs are sensitive to feature scales, and feature scaling is necessary for better decision boundaries. Without proper scaling, the algorithm may struggle to find the optimal hyperplane, leading to subpar performance. Feature scaling helps to reduce the effect of dominant features and ensures that all features have a similar scale.
Hard margin classification imposes that all instances must be off the street and on the right side, but it only works if the data is linearly separable and is sensitive to outliers. In practice, however, real-world datasets rarely meet these conditions, making hard margin classification somewhat restrictive.
Soft margin SVM classification finds a balance between keeping a large street and limiting margin violations, and the degree of softness is defined by the C hyperparameter. This allows for more flexibility in the model's decisions, making it a more robust approach.
Low C value ends up with a simpler model, while high C value ends up with a model that is more prone to overfitting. By adjusting the C value, the model can be tailored to the specific needs of the problem at hand.
Margin violations are bad, but reducing C can regulate the model and prevent overfitting. By allowing for more margin violations, the model can become less prone to overfitting, making it more suitable for datasets with noisy or incomplete labels.
The optimal model is one that maximizes the margin, which means finding the hyperplane that stays as far away from the closest training instances as possible. This ensures that the model learns to make predictions based on the underlying structure of the data, rather than relying on chance or noise.

Support Vectors

Support vectors are instances located on the edge of the street that fully determine the decision boundary. These vectors play a crucial role in defining the model's predictions and are essential for its success.
Adding more training instances "off the street" will not affect the decision boundary. This is because the support vectors already capture the essential information needed to make accurate predictions, making the additional instances redundant.
The model is fully determined by the support vectors, and making predictions involves computing the dot product of the new input vector with only the support vectors. This computational efficiency enables efficient inference and reduces the computational burden.

Kernel SVMs

Kernel SVMs use a kernel function to transform the data into a higher-dimensional space where it becomes linearly separable. This transform helps the model better capture complex relationships between features, allowing it to make more accurate predictions.
The RBF kernel is an example of a kernel function used in kernel SVMs. This choice of kernel enables the model to learn non-linear relationships between features, making it particularly effective for datasets with non-linear boundaries.
Making predictions with a kernelized SVM involves computing the dot product of the new input vector with only the support vectors, not all the training instances. This efficient computation makes kernel SVMs suitable for large-scale applications.