Mathematical Proofs Flashcards for Support Vector Machines

Study Notes

SVMs find the linear classifier with the maximum margin, which is the distance between the separating hyperplane and the nearest data points.
The maximum margin is achieved by minimizing the norm of the weight vector w subject to the constraint that the data points are classified correctly.
The SVM problem can be formulated as: arg max min w,b t=1,...,nt |wT xt + b| / ||w|| subject to st (wT xt + b) > 0 for all t in 1,...,nt
The hard SVM problem can be written as: arg min ||w̃|| subject to st (w̃T xt + b̃) ≥ 1 for all t in 1,...,nt
Figure 2 illustrates the concept of SVM with r = |wT x + b| / ||w||

The ReLU activation function is defined as σ(x) = max{x, 0}
The ReLU activation function with a parameter α < 1 is defined as σ(x) = max{x, αx}
The sigmoid activation function is defined as σ(x) = (1 + exp(-x))^{-1}

A MLP is a neural network with multiple layers, where each layer is a parametric function
The output of a MLP is the composition of the output of each layer
The parameters of a MLP are the union of the parameters of each layer
The architecture of a MLP refers to the specification of its layers

The numerical gradient is an approximation of the partial derivative of the loss function with respect to the model parameters
The numerical gradient can be computed using the finite difference approximation
The numerical gradient is approximate and computationally expensive

The analytical gradient is an exact computation of the partial derivative of the loss function with respect to the model parameters
The analytical gradient can be computed using the chain rule
The analytical gradient is exact and computationally efficient

Backpropagation is an algorithm for computing the gradients of the loss function with respect to the model parameters
Backpropagation uses the chain rule to compute the gradients recursively
The backpropagation algorithm consists of forward and backward passes
The forward pass computes the output of the network, and the backward pass computes the gradients of the loss function with respect to the model parameters

Goodfellow, Y., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science.