Summary

This document discusses classification, a supervised machine learning approach. It identifies which category a new observation belongs to based on a training dataset of observations. The primary goal is to create a function that accurately predicts the category of new, unseen data points.

Full Transcript

· CLASSIFICATION. Refers to identifying which category ordass a new observation belongs based dataset observations...

· CLASSIFICATION. Refers to identifying which category ordass a new observation belongs based dataset observations to , on a training containing whose category membership is known. Supervised learning method algorithm learns from labeled data to make predictions or decisions. The model trained to map input features to a specific label by is learning the relationship and boundaries that separate different lasses. Primary goal create a function that can accurately predict the category of new or unseen data points. So , in other words , classification involves finding a decision boundary that can efficiently separate input data into distinct classes. So the purposes are : Data Categorization. Automation of decision making. Predictive Modeling. Interpreting TheOutput of a Classification Model Models output probabilities often that indicate how confident they are about their predictions. they express the can degree of certainty about which class a input belongs to. given The output is represented as which is often , the result of a function parameterized by 0 : it's expressed like : = for = hor) , where X : input feature rector. 0 : model parameters. ho : function that outputs f. The prediction provides a vector of probabilities , that belong to a probability simplex. mathematical construct used to represent the set of all possible probability distributions over a finite number of categories for classes. If we have "classes (c-1)-dimensional probability simplex , the is the set of all c-dimensional non-negative vectors = /y1 42 , yo , -... that satisfy two conditions : 1) Non-negativity. Gi20 Vi = 1,... C. 2) Normalization. i = 1. In a c-class case, c forms a (c-1) dimensional polytope. To make a final classification decision/based on a given proba bility distribution) , the model uses the argmax function. mathematical operation used to identify the argument at which functionI achieves its a given maximum value · In our contest , it selects the class that the model considers most probable for a given input. angmax : fli) = Si fritzf(i) for all 53 In our case fririg so : c = arymout , i. , So this finds out the class with the highest probability : EXAMPLE. We are developing a model to determine whether an email is spam or not. The model features from the email (like the number of takes in suspicious words) and outputs a probability distribution over two classes : 1) not spam 2) Spam (classe) -class of 3 suppose the y = [0. 85 , model outputs 0. 15] So themodel predicts with es confidence that the email is not a spam (class of and predicts with 15 % confidence that the email is spam /class 1). So , i is a valid probability distribution /both non-negative and sum to argmax function So we use : c argmax : [0 85 0 15] 0 =. =. , So the model classifies the email as "not spam" based on the higher probability. Binary Classification. Also this is a supervised learning approach The objective is to assign data to one of two distinct classes. The target variable y in binary classification is constrained to take one of two values : ye50 13 , , y-0 and y : 1 correspond to the two possible classes. Typically : yo is used for a negative outcome. y = 1 is used for a positive outcome. So this is a special and simpler case of classification where there are only two possible outcomes , however , many real world problems can't be addressed by a simple binary division. this is where multi-class classification comes in EXAMPLE. We try to classify emails as Not Spam (0) and Spam (1). The decision is based on the number of times certain keywords /like "discount" , " urgent "ecc. ( appear in the email. 41 : is the number of occurrences of a specific keyword in the email The prediction function fix) outputs a probability value between o and 1 , indicating the likelihood that an email is not a spam. If f(x) 20 -. 5 Not Spam/ot. If fixio. 5 Spam (1). Suppose we have: Email 1 : Email 2 : X1 = 2. X1 = 5. f(x) = 0. 7. f(x) = 0 3.. Not spam /o. Spam(1) : Any value of fra above as lies in the "Not Spam" region , while values below 0 3. Ie in the "Spam * region. The sigmoid function is a mathematical function that maps any real-valued number to a value between 0 and 1. Used to transform the output to a probability : · (E) = 1 fe-z , where z is the real-valued input. It has an 5-shaped (sigmsidal) curve. Smoothly transitions from o to 1 as z increases. The decisionboundary is a surface for a line in the 2-dimensions that separates the feature space into different regions · Formally it's the set of points- for which the probability of dass membership is equal to a certain threshold. Suppose we have a logistic regression model : forx) = /3+ x1 + +2) The decision boundary given by is : 3 + 11 + x = 0 xz = - X1 - xy This equation represents a straight line in the /x1x2) plane. Red region where model predicts g = 1 (class 1). Blue region where model predicts g = (classo). The decision boundary can also be Circular : defined by -al + (x - b) = p 2 - if v = p 2, it if v > p 2, it belongs to the belongs to the inner cass. auter dass. · LIKELIHood FUNCTION. The Likelihood (L) of a model is a measure of how well the model explains the observed data. It is expressed as : L = py htt In of probabilistic model for classification the case or regression , likelihood is the product of probabilities of observing each data point. · LoG-LikeliHood FUNCTION. Denoted log

Use Quizgecko on...
Browser
Browser