L4 PDF - Northwestern Lecture Notes - ML
Document Details
Uploaded by InfallibleLawrencium3753
Northwestern
Tags
Summary
These notes cover topics in machine learning, including multi-class classification, softmax regression, and logistic regression. The document also includes questions and examples for classification problems.
Full Transcript
Things to note Call for students’ volunteer for final presentation (2 bonus points) Send me an email, first come first serve Last Time Lecture: Quiz3: Vectorized Gradient Descent for MLR Implementing Vector...
Things to note Call for students’ volunteer for final presentation (2 bonus points) Send me an email, first come first serve Last Time Lecture: Quiz3: Vectorized Gradient Descent for MLR Implementing Vectorized GD MLR Regularized Gradient Descent for MLR Implementing Regularized GD for MLR Linear regression in Sklearn Comparing the performance of your implementation with linear models from Sklearn Last time: Using Matrix Format in GD for MLR Cost Function: Gradient of the cost function: Gradient Descent Update Rule: Benefits: Faster code (using optimized matrix libraries) More compact equations Last time: Matrix-Based Regularized GD Idea: Penalize for large values of 𝑤𝑗 Can incorporate into the cost function L2 Regularization L1 Regularization Last Time: Linear models from Sklearn from sklearn.linear_model import LinearRegression Closed Form Solution (OLS) from sklearn.linear_model import SGDRegressor Stochastic Gradient Descent from sklearn.linear_model import Lasso L1 Regularization from sklearn.linear_model import Ridge L2 Regularization from sklearn.linear_model import ElasticNet L1 + L2 Regularization Alpha: Regularization term (controls the strength of regularization) Quiz 3: Early Stopping: Another Strategy to Combat Overfitting Today’s agenda Lecture: Quiz 4: Multi Class Classification Explore different ML classifiers for Softmax Regression multi-class classification Performance Metrics for multi-class Explore different strategies for classification mitigating imbalanced multi-class Mitigating Imbalanced classification classification Multi-Class Classification Extending Binary Classification Multi-Class Classification In the sequence course, we learned binary classification using logistic regression and Tree-based models What if the classification task has more than two classes? Scikit-learn Estimators for Multiclass classification https://scikit- learn.org/stable/modules/ multiclass.html Multi Class Classification Inherently Multiclass Models “One-vs-Rest” approach “One-vs-One” approach. Softmax Classifier (Multinomial Logistic Regression) Inherently Multiclass Algorithms Tree-based Models KNN One vs. Rest (All) Using binary classification for multi- class classification For each class present in the data, we can build a binary classifier, which output a probability-like score Training a One-vs-Rest classifier involves creating n binary classifier for n classes The argmax of scores is used to - Binary Classification Problem 1: Green vs [Blue, Red] - Binary Classification Problem 2: Blue vs [Green, Red] predict a class - Binary Classification Problem 3: Red vs [Green, Blue] One vs. Rest (All) Downside: Slow (Train 1 binary classifier for each class, each training involves all the data) One vs. All (Rest) https://scikit-learn.org/stable/modules/multiclass.html One vs. One Using binary classification for multi-class classification For each pair of classes, we can build a binary classifier Training a One-vs-One classifier involves creating n (n-1)/2 binary classifier for a dataset with n classes, each one predicts one class label Splitting the whole dataset into one binary classification dataset for each pair of classes Take the majority vote as a result One vs. One Using binary classification for multi- class classification For each pair of classes, we can build a binary classifier Training a One-vs-One classifier involves creating n (n-1)/2 binary classifier for a dataset with n classes The argmax of scores is used to predict a class One vs. One Downside: Computationally expensive, requiring far more binary classifier to be trained Upside: if the classifier being used scales poorly, and the dataset is sufficiently large, training this many two-class classifiers may be faster or provide better results than classification in the one-versus-rest scheme, which considers every point. Rarely used in practice due to limited improvements in accuracy and higher computational costs. Softmax regression Generalization of Logistic Regression (multinomial) Logistic Regression from sequence course = p(y=1|𝑥) Ԧ p(y=0|𝑥) Ԧ = 1- p(y=1|𝑥) Ԧ Binary vs. Multiclass regression Output: Logistic regression (Yes-or-no question), the output is a single probability value for one class, and the other class probability is 1−𝑃(𝑦=1∣𝑥). Softmax regression (Multiple choice question), there are 𝐾 output probabilities, one for each class, and these probabilities sum to 1. Binary vs. Multiclass regression (Continued) Binary classification: p (ⴘ = 1| 𝑥) Multiclass classification: Softmax Regression N Classes If there are 4 features and 3 classes (say Class A, Class B, and Class C), the weight matrix 𝑊 and bias vector 𝑏 would look like: Each column 𝑊𝑗 corresponds to a class, Each 𝑏𝑗 corresponds and each row 𝑊𝑖𝑗 to the bias for class 𝑗. corresponds to a feature's weight for that class. Parameter Structure Comparison Cost Function for Logistic Regression Binary cross-entropy: Why negative log loss a natural choice for training classifiers? When 𝑝=1 (perfect prediction), the loss is 0. When 𝑝→0 (poor prediction), the loss becomes very large. Penalizes wrong predictions strongly: encourage the model to assign high probability to the correct class Cost Function for Softmax Regression Cross-entropy: Softmax Regression Cat Car Frog Review the “Multiclass_Classification_in_Python” notebook for implementation Performance Metrics Multiclass classification Classification Metric for Binary Classification Recall Error Rate 𝐶𝑇𝑃 𝐶𝐹𝑃 + 𝐶𝐹𝑁 𝑁𝑃 𝑁 Precision Accuracy Rate 𝐶𝑇𝑃 𝐶𝑇𝑃 + 𝐶𝑇𝑁 𝐶𝑇𝑃 + 𝐶𝐹𝑃 𝑁 F1-Score 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙 2 𝑃𝑟𝑒𝑐𝑖𝑠𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 Confusion Matrix in Multi-class Error Rate Recall 𝐶𝐹𝑃 + 𝐶𝐹𝑁 𝐶𝑇𝑃 𝑁 𝑁𝑃 Accuracy Rate Precision 𝐶𝑇𝑃 + 𝐶𝑇𝑁 𝐶𝑇𝑃 𝑁 𝐶𝑇𝑃 + 𝐶𝐹𝑃 F1-Score Can they still work 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙 for multi-class 2 𝑃𝑟𝑒𝑐𝑖𝑠𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 classification? Accuracy in Multi-class Accuracy Rate 𝐶𝑇𝑃 + 𝐶𝑇𝑁 𝑁 It disregards class balance and the cost of different errors. Precision, Recall, and f1-score in Multi-class Recall 𝐶𝑇𝑃 𝑁𝑃 Precision 𝐶𝑇𝑃 𝐶𝑇𝑃 + 𝐶𝐹𝑃 F1-Score 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙 2 𝑃𝑟𝑒𝑐𝑖𝑠𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 They can be used to evaluate the capability of a classifier to detect a particular class! Precision, Recall, and f1-score in Multi-class Per class The whole classifier Macro-averaging in Multi-class Average the precision and recall across all classes. Weighted-averaging in Multi-class Weighted Average the precision, recall, or f1 score across all classes. It gives more weight to classes with larger instances. Imbalanced Classification Multiclass classification Imbalanced Classification Dealing with Imbalanced classification Adjusting Class weight during training Cost-sensitive learning Stratified splitting is Oversampling minority class required for imbalanced dataset Undersampling majority class SMOTE (Synthetic Minority Oversampling Technique) In Python, you can use `imbalanced-learn` library to handle imbalanced datasets. It provides various techniques for resampling the dataset, such as under-sampling, over-sampling, and combining methods. Install `imbalanced-learn` if you haven’t already. Adjusting class weight during training Adjusting the relative importance of each class during the training of a machine learning model to account for the class imbalance It penalizes the misclassification made by the minority class by setting a higher class weight and at the same time reducing weight for the majority class In scikit-learn, many classifiers have a `class_weight` parameter that you can set to automatically adjust the class weights during training Sampling Techniques: Over-sampling: Random Over-sampling: Randomly duplicate samples from the minority class to balance class distribution. SMOTE (Synthetic Minority Over-sampling Technique): Generate synthetic samples for the minority class by interpolating between existing samples. Under-sampling: Random Under-sampling: Randomly remove samples from the majority class to balance class distribution. Which average should we choose for imbalanced classification? In the context of an imbalanced dataset where equal importance is attributed to all classes, opting for the macro average stands as a sound choice since it treats each class with equal significance. For instance, in our scenario involving the classification of airplanes, boats, and cars, employing the macro-F1 score would be appropriate. However, when dealing with an imbalanced dataset and aiming to give more weight to classes with larger examples, the weighted average proves preferable. This approach adjusts the contribution of each class to the F1 average based on its size, offering a more balanced perspective. Quiz 4 : Explore different ML classifiers for multiclass classification Explore different strategies to deal with imbalanced classification Next Time Lecture: In-class Assignment: Unsupervised learning K-means and other clustering algorithms Clustering Thank you