Quiz2.pdf
Document Details
Uploaded by SmartGraph
Full Transcript
Classification Problems Solving a classification problem means, identifying to which category a given example belongs. Binary classification – two class labels Disease/no disease, spam/not spam, buy/do not buy … Multi-class classification – more than two class labels Hand written cha...
Classification Problems Solving a classification problem means, identifying to which category a given example belongs. Binary classification – two class labels Disease/no disease, spam/not spam, buy/do not buy … Multi-class classification – more than two class labels Hand written character recognition, Face recognition, … Linear Separability Linear Separability cont. When the input dimension is 2, the boundary becomes a plane. When the input has more than 2 dimensions, the boundary is a hyperplane. Perceptron The perceptron is the simplest form of a neural network used for the classification of patterns said to be linearly separable (i.e., patterns that lie on opposite sides of a hyperplane). The two classes C1 and C2 are sufficiently separated from each other for us to draw a hyperplane (in this case, a straight line) as the decision boundary. If, however, the two classes C1 and C2 are allowed to move too close to each other, as in (b) above, they become nonlinearly separable, a situation that is beyond the computing capability of the perceptron. Perceptron cont. Perceptron consists of a single neuron with adjustable synaptic weights and bias. The algorithm used to adjust the free parameters of this neural network first appeared in a learning procedure developed by Rosenblatt (1958, 1962) for his perceptron brain model. Rosenblatt proved that if the patterns (vectors) used to train the perceptron are drawn from two linearly separable classes, then the perceptron algorithm converges and positions the decision surface in the form of a hyperplane between the two classes. The proof of convergence of the algorithm is known as the perceptron convergence theorem. The goal of the perceptron is to correctly classify the set of externally applied stimuli x1, x2,..., xm into one of two classes, C1 or C2. The decision rule for the classification is to assign the point represented by the inputs x1, x2,..., xm to class C1 if the perceptron output y is +1 and to class C2 if it is -1 Decision boundary There are two decision regions separated by a hyperplane defined by: Perceptron convergence Theorem Input vector (including the bias) Weight vector Linear combiner output. According to the perceptron convergence theorem, if the input variables of the perceptron originate from two linearly separable classes there exists a weight vector w such that we may state: Given the subsets of training vectors from the two classes, the training problem for the perceptron is then to find a weight vector w that satisfies the two inequalities Perceptron Convergence Algorithm Example X1 X2 Class 1 1.5 +1 1.5 2.5 +1 2 1 +1 2.5 4 -1 3 2.5 -1 5 2 -1 +1 0 𝑋 0 = +1 W0 = 0 +1.5 0 𝑇 +1 0 +1 𝑣 = 𝑤𝑇𝑋 = 0 +1 = 0 0 0 +1 = 0 0 +1.5 +1.5 y 0 = sign(0) = 0 w 1 = 𝑤 0 + 𝜇 𝑑 0 − 𝑦 0 𝑋(0) Module Code | Module Name | Lecture Title | Lecturer Example b w1 w2 1.8 -0.3 -0.5 Perceptron Convergence Algorithm The Batch Perceptron Algorithm The previously discussed perceptron convergence algorithm was presented without reference to a cost function. Moreover, the derivation focused on a single-sample correction. Introducing the following two will make it generalized. The generalized form of a perceptron cost function use the cost function to formulate a batch version of the perceptron convergence algorithm Module Code | Module Name | Lecture Title | Lecturer Module Code | Module Name | Lecture Title | Lecturer Batch Perceptron Algorithm We can define a cost function for the perceptron as follows: If all samples are correctly classified, then the set X would be empty and J(w) would be zero. Differentiate J(w) with respect to w gives the gradient vector Batch Perceptron Algorithm cont. The weight adjustment method that we discussed above is called the method of steepest descent as the amount of adjustment depends on the steepness of the cost function (i.e. gradient). The weight adjustment happens in the direction opposite to the gradient vector. The gradient vector is mathematically denoted by