Lec02.pdf

IT3071 – Machine Learning and Optimization Methods Lesson 2 - Perceptron Prof. Chandimal Jayawardena IT3071 | MLOM | Prof. Chandimal Jayawardena First Lesson - Revision IT3071 | MLOM | Prof. Chandimal Jayawardena Classification Problems Solving a classification problem means, identifying to which category a given example belongs. Binary classification – two class labels Disease/no disease, spam/not spam, buy/do not buy … Multi-class classification – more than two class labels Hand written character recognition, Face recognition, … IT3071 | MLOM | Prof. Chandimal Jayawardena Linear Separability IT3071 | MLOM | Prof. Chandimal Jayawardena Linear Separability cont. When the input dimension is 2, the boundary becomes a plane. When the input has more than 2 dimensions, the boundary is a hyperplane. IT3071 | MLOM | Prof. Chandimal Jayawardena Perceptron The perceptron is the simplest form of a neural network used for the classification of patterns said to be linearly separable (i.e., patterns that lie on opposite sides of a hyperplane). The two classes C1 and C2 are sufficiently separated from each other for us to draw a hyperplane (in this case, a straight line) as the decision boundary. If, however, the two classes C1 and C2 are allowed to move too close to each other, as in (b) above, they become nonlinearly separable, a situation that is beyond the computing capability of the perceptron. IT3071 | MLOM | Prof. Chandimal Jayawardena Perceptron cont. Perceptron consists of a single neuron with adjustable synaptic weights and bias. The algorithm used to adjust the free parameters of this neural network first appeared in a learning procedure developed by Rosenblatt (1958, 1962) for his perceptron brain model. Rosenblatt proved that if the patterns (vectors) used to train the perceptron are drawn from two linearly separable classes, then the perceptron algorithm converges and positions the decision surface in the form of a hyperplane between the two classes. The proof of convergence of the algorithm is known as the perceptron convergence theorem. IT3071 | MLOM | Prof. Chandimal Jayawardena The goal of the perceptron is to correctly classify the set of externally applied stimuli x1, x2,..., xm into one of two classes, C1 or C2. The decision rule for the classification is to assign the point represented by the inputs x1, x2,..., xm to class C1 if the perceptron output y is +1 and to class C2 if it is -1 IT3071 | MLOM | Prof. Chandimal Jayawardena Decision boundary There are two decision regions separated by a hyperplane defined by: IT3071 | MLOM | Prof. Chandimal Jayawardena Perceptron convergence Theorem Input vector (including the bias) Weight vector Linear combiner output. According to the perceptron convergence theorem, if the input variables of the perceptron originate from two linearly separable classes there exists a weight vector w such that we may state: Given the subsets of training vectors from the two classes, the training problem for the perceptron is then to find a weight vector w that satisfies the two inequalities IT3071 | MLOM | Prof. Chandimal Jayawardena Training Data X1 X2 Class Example 1 1.5 1.5 2.5 +1 +1 2 1 +1 2.5 4 -1 3 2.5 -1 5 2 -1 Module Code | Module Name | Lecture Title | Lecturer Example Module Code | Module Name | Lecture Title | Lecturer Example b 1.8 w1 -0.3 w2 -0.5 Module Code | Module Name | Lecture Title | Lecturer Perceptron Convergence Algorithm IT3071 | MLOM | Prof. Chandimal Jayawardena The Batch Perceptron Algorithm The previously discussed perceptron convergence algorithm was presented without reference to a cost function. Moreover, the derivation focused on a single-sample correction. Introducing the following two will make it generalized. The generalized form of a perceptron cost function use the cost function to formulate a batch version of the perceptron convergence algorithm IT3071 | MLOM | Prof. Chandimal Jayawardena The Batch Perceptron Algorithm cont. We define the perceptron cost function as: where X is the set of samples x misclassified by a perceptron using w as its weight vector Thus, differentiating J(w) with respect to w yields the gradient vector In the method of steepest descent, the adjustment to the weight vector w at each timestep of the algorithm is applied in a direction opposite to the gradient vector. Accordingly, the algorithm takes the form The adjustment applied to the weight vector at time-step n + 1 is defined by the sum of all the samples misclassified by the weight vector w(n), with the sum being scaled by the learning-rate parameter η(n). IT3071 | MLOM | Prof. Chandimal Jayawardena

Document Details

Related

Full Transcript

Upgrade to continue