Introduction to Machine Learning Lecture 1 PDF

Introduction to Machine Learning Linear Regression Lecture 1 Xiaoxiao Miao January 2025 Learning Objectives Machine Learning Basics Di erent types of machine learning Di erent types of supervised learning Narrow and general AI Discriminative and generative AI A Case Study of Linear Regression 2 ff ff Learning Objectives Machine Learning Basics Di erent types of machine learning Di erent types of supervised learning Narrow and general AI Discriminative and generative AI A Case Study of Linear Regression 3 ff ff Machine Learning Basics Arti cial Intelligence is a scienti c eld concerned with the development of algorithms that allow computers to learn without being explicitly programmed Machine Learning is a branch of Arti cial Intelligence, which turns things (data) into numbers and nding patterns in those numbers. Deep learning is a machine learning This course covers many classic ML sub eld that uses multi-layer Neural algorithms Networks for nding patterns. Classic ML is still widely used in various elds: Arti cial Intelligence In some cases, performs better than DNN in terms of e ciency, robustness, and data or computational demands Machine Learning It provides interpretability and fundamental concepts Gaining intuition from or integrating ML is Deep Learning quite helpful …. 4 fi fi fi ffi fi fi fi fi fi fi Learning Objectives Machine Learning Basics Di erent types of machine learning Di erent types of supervised learning Narrow and general AI Discriminative and generative AI A Case Study of Linear Regression 5 ff ff Di erent types of Machine Learning Semi- Reinforcement Supervised Unsupervised Supervised Learning Self-supervised: Learns from learn to act Learns with Discover patterns generate its own incomplete, based on labeled data in unlabeled data labels from Inaccurate labels feedback/reward unlabeled data Training Learning from being given “right answers” Testing/inference/prediction Image source 6 ff Di erent types of Machine Learning Semi- Reinforcement Supervised Unsupervised Supervised Learning Self-supervised: Learns from learn to act Learns with Discover patterns generate its own incomplete, based on labeled data in unlabeled data labels from Inaccurate labels feedback/reward unlabeled data Image source 7 ff Di erent types of Machine Learning Semi- Reinforcement Supervised Unsupervised Supervised Learning Self-supervised: Learns from learn to act Learns with Discover patterns generate its own incomplete, based on labeled data in unlabeled data labels from Inaccurate labels feedback/reward unlabeled data Image-based SSL: Masking parts of an image and having the model reconstruct the missing parts Text-based SSL: Predicting the next word in a sentence or lling in missing words Audio-based SSL: Predicting the next segment of an audio clip or reconstructing a masked portion of it 8 fi ff Di erent types of Machine Learning Semi- Reinforcement Supervised Unsupervised Supervised Learning Self-supervised: Learns from learn to act Learns with Discover patterns generate its own incomplete, based on labeled data in unlabeled data labels from Inaccurate labels feedback/reward unlabeled data Image source 9 ff Di erent types of Machine Learning Semi- Reinforcement Supervised Unsupervised Supervised Learning Self-supervised: Learns from learn to act Learns with Discover patterns generate its own incomplete, based on labeled data in unlabeled data labels from Inaccurate labels feedback/reward unlabeled data A score based on the quality of the responsee GPT model User inputs/questions Image source 10 ff Check your understanding Unsupervised or supervised learning algorithm? Given email labeled as spam/not spam, learn a spam lter Given a set of news articles found on the web, group them into sets of articles about the same story Given a database of customer data, automatically discover market segments and group customers into di erent market segments Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients having diabetes or not 11 ff fi Learning Objectives Machine Learning Basics Di erent types of machine learning Di erent types of supervised learning Narrow and general AI Discriminative and generative AI A Case Study of Linear Regression 12 ff ff Di erent types of supervised learning Examples of supervised learning tasks Search and Recommendation Sequence Regression Classi cation Ranking System Learning Learning from being given “right answers” Input (x) Output label (y) Application Basic Regression House size Selling price Price prediction Basic Classi cation Email Spam (0/1) Spam ltering Search and User query Related content Search engines Ranking Recommendation Ad & user info click(0/1) Online advertising System Sequence Learning Audio Text transcripts Speech recognition Referecnce: https://d2l.ai/chapter_introduction/index.html#supervised-learning 13 ff fi fi fi Di erent types of supervised learning Examples of supervised learning tasks two common types Search and Recommendation Sequence Regression Classi cation Ranking System Learning - Regression: The function/model output is arbitrary values within a speci c range, in nitely many possible outputs Image Source how much? or how many? quesitons The amount of f Test scores hours studied (Function/ achieved model) 4.5h 14 ff fi fi fi Di erent types of supervised learning Classes and categories are often used interchangebably Examples of supervised learning tasks two common types Search and Recommendation Sequence Regression Classi cation Ranking System Learning Which category ? questions - Classi cation: Given options (classes), the function outputs the correct one category, small number of possible outputs Pass(1) /Fail(0) 1 Fail Pass 0 Hours spent Hours spent studying studying 15 ff fi fi Di erent types of supervised learning Classes and categories are often used interchangebably Examples of supervised learning tasks two common types Search and Recommendation Sequence Regression Classi cation Ranking System Learning Which category ? questions - Classi cation: Given options (classes), the function outputs the correct one, small number of possible outputs Tow inputs Fail Class Pass attendence ? ? Hours spent Hours spent studying studying Decision boundaries 16 ff fi fi Di erent types of supervised learning Classes and categories are often used interchangebably Examples of supervised learning tasks two common types Search and Recommendation Sequence Regression Classi cation Ranking System Learning Which category ? questions - Classi cation: Given options (classes), the function outputs the correct one Dog Spam Cat f Horse Not Spam f Fish Binary Classification Rooster … Dog Cat Multilabel Classification f Horse (Tagging) Fish not mutually exclusive Multiclass Classification … 17 ff fi fi Supervised Learning Algorithms Supervised Learning Regression Classi cation w1 W2 Linear regression Logistic regression W3 Nerual network Nerual network W4 Decision tree Decision tree W4 Random Forest Random Forest W4 AdaBoost AdaBoost W5 SVM SVM W5 KNN Naive Bayes 18 fi Check your understanding Learning from being given “right answers” - Regression: - predict a number or categories - In nitely many or small number of possible outputs - Classi cation: - predict a number or categories - In nitely many or small number of possible outputs 19 fi fi fi Learning Objectives Machine Learning Basics Di erent types of machine learning Di erent types of supervised learning Narrow and general AI Discriminative and generative AI A Case Study of Linear Regression 20 ff ff Narrow and General AI 21 Image source Discriminative and Generative AI - Discriminative/Predictive AI: - classify or di erentiate between exsiting data points - forcast futrue outcomes based on historical data - Generative AI: learn from exsiting data and generate new instances that mimic the training data distribution Image source 22 ff Learning Objectives Machine Learning Basics Di erent types of machine learning Di erent types of supervised learning Narrow and general AI Discriminative and generative AI A Case Study of Linear Regression 23 ff ff The Overall Process of Machine Learning Model deployment and Data collection Model training Model evaluation intergration Key components for machine learning An optimization algorithm to adjust the model’s parameters to minimize the loss A model of how to transform the data Update the model (Function Parameters) Data Design a model (Function) Check if good enough The data we can learn from A loss function that quanti es the badness of our model 24 fi A Case Study Simple Linear Regression - Regression: The function/model outputs is arbitrary values within a speci c range how much? or how many? quesitons Image Source The amount of Test scores hours studied f achieved 25 fi Simple Linear Regression - training set - Supervised learning: learning from being given “right answers” Hours of study Score 32.502345269453 31.7070058465699 53.426804033275 68.7775959816389 61.5303580256364 62.5623822979458 47.4756396347861 71.5466322335678 … …. Data Model Loss function Optimization algorithm 26 Simple Linear Regression - training set - Terminology x = “input” variable - feature y = “output” variable “target” variable Training set x Hours of study Score y N = number of training examples x1 32.502345269453 31.7070058465699 y1 x2 53.426804033275 68.7775959816389 y2 (x, y) = single training example 61.5303580256364 62.5623822979458 N 47.4756396347861 71.5466322335678 … …. (x n, y n) = nth training example (1st, 2nd, 3rd) Note that x 2 ≠ x to the power 2, just refer to the 2nd training example Data Model Loss function Optimization algorithm 27 Simple Linear Regression - model - How to represent model f ? - Stick with a straight line rst, linear regression Targets: y Estimated y Feature: x Model: f Prediction: ŷ ŷ = fw,b(x) = wx + b Same: omit w and b ŷ = f(x) = wx + b Learnable parameters: w (weights) and b What do w, b do? (bias), can be any value Data Model Loss function Optimization algorithm 28 fi Simple Linear Regression - model - What do w, b do in the linear regression? f(x) = 0 ⋅ x+1.5 f(x) = 0.5 ⋅ x+0 f(x) = 0.5 ⋅ x+1 3 3 3 slope intercept w = 0.5 b=0 w = 0.5 b=1 2 w=0 b = 1.5 2 2 0.5 1 1 1 1 1.5 0.5 1 1 0 0 0 1 2 3 1 2 3 1 2 3 The weights w de nes the orientation The bias b de nes the position Data Model Loss function Optimization algorithm 29 fi fi Simple Linear Regression - model - w, b can be any values, how to e ectively nd the best w, b The weights w de nes the orientation The bias b de nes the position w and b are parameters (can be any value) f1 : ŷ = 10 ⋅ x + 9 f2 : ŷ = 7 ⋅ x + 8 …. in nite Data Model Loss function Optimization algorithm 30 fi fi fi ff fi Simple Linear Regression - loss function - w, b can be any values, how to e ectively nd the best w, b - Most common one for all regression problems: mean square error (MSE): a cost (loss) function that quanti es the badness of the model with di erent w, b yn̂ = fw,b(x n) = wx n + b n (x , ŷ ) n Estimation error yn̂ − y n Square of est. error ( yn̂ − y n)2 N yn̂ − y n ( yn̂ − y n)2 ∑ Sum over samples n=1 N (x n, y n) 1 ( ŷ − y n)2 N∑ Average n=1 Multiply 2: Just 1 N n 1 N ( ŷ − y n)2 = ( fw,b(x n) − y n)2 2N ∑ ∑ MSE: L(w, b) = for computation n=1 2N n=1 convenience Data Model Loss function Optimization algorithm 31 ff ff fi fi Simple Linear Regression - loss function - w, b can be any values, how to e ectively nd the best w, b - Most common one for all regression problems: mean square error (MSE): a cost (loss) function that quanti es the badness of the model with di erent w, b 1 N n 1 N ( ŷ − y n)2 = ( fw,b(x n) − y n)2 2N ∑ ∑ MSE: L(w, b) = n=1 2N n=1 The loss will usually be a nonnegative number where smaller values are better and perfect predictions incur a loss of 0. Intuition: Find w and b, yn̂ is close to y n for all (x n, y n) -> L(w, b) is minimum Data Model Loss function Optimization algorithm 32 ff ff fi fi Simple Linear Regression - optimization n n n n - Find w and b , y ̂ is close to y for all (x , y ) -> minimize L(w, b) Consider simpler fw,b(x) = wx + b case rst fw(x) = wx, b = ∅ model (linear regression) model (linear regression) fw,b(x) = wx + b Consider fw(x) = wx, b = ∅ Parameters Just one parameter w, b w Loss function Loss function 1 N 1 N ( fw,b(x n) − y n)2 ( fw(x n) − y n)2 2N ∑ 2N ∑ L(w, b) = L(w) = n=1 n=1 Goal: Goal: minimize L(w, b) minimize L(w) 33 fi Simple Linear Regression - optimization - See how fw(x) and L(w) are related, letʼs change w fw(x) L(w) For xed w, function of x Input function of w Parameters fw(x) L(w) 3 E.g w = 1, 3 fw(x) = x 3 (x , y )3 (3,3) 2 2 (x 2, y 2) (2,2) 1 (x 1, y 1) 1 (1,1) L(1) = 0 0 x 1 2 3 0 0.5 1 1.5 w 1 N 1 N w=1 1 N 1 N ( fw(x n) − y n)2 = (wx n − y n)2 ( f1(x n) − y n)2 = (1x n − y n)2 2N ∑ 2N ∑ 2N ∑ 2N ∑ L(w) = L(1) = n=1 n=1 n=1 n=1 1 L(1) = (1 − 1)2 + (2 − 2)2 + (3 − 3)2 = 0 2*3 34 fi Simple Linear Regression - optimization - See how fw(x) and L(w) are related, letʼs change w fw(x) L(w) For xed w, function of x Input function of w Parameters fw(x) L(w) 3 E.g w = 0.5, 3 fw(x) = 0.5x 2 1.5 2 1 1 0.5 1 L(0.5) = 0.58 L(1) = 0 0 x 1 2 3 0 0.5 1 1.5 w N N w = 0.5 1 n n 2 1 1 N 1 N (wx n − y n)2 2N ∑ 2N ∑ L(w) = ( fw(x ) − y ) = ( f0.5(x n) − y n)2 = (0.5x n − y n)2 2N ∑ ∑ L(0.5) = n=1 n=1 n=1 2N n=1 1 2 2 2 3.5 L(0.5) = (0.5 − 1) + (1 − 2) + (1.5 − 3) = ≈ 0.58 2*3 6 35 fi Simple Linear Regression - optimization - See how fw(x) and L(w) are related, letʼs change w fw(x) L(w) For xed w, function of x Input function of w Parameters fw(x) L(w) 3 E.g w = 0, 3 fw(x) = 0 2 L(0) = 2.33 3 2 2 1 1 L(0.5) = 0.58 1 L(1) = 0 0 x 1 2 3 0 0.5 1 1.5 w N N w=0 1 n n 2 1 1 N 1 N (wx n − y n)2 2N ∑ 2N ∑ L(w) = ( fw(x ) − y ) = ( f0(x n) − y n)2 = (0x n − y n)2 2N ∑ ∑ L(0) = n=1 n=1 n=1 2N n=1 1 2 2 2 14 L(0) = (0 − 1) + (0 − 2) + (0 − 3) = ≈ 2.33 2*3 6 36 fi Simple Linear Regression - optimization - See how is fw(x) and L(w) related, letʼs change w fw(x) L(w) For xed w, function of x Input function of w Parameters fw(x) 3 E.g w = 1.5, L(w) fw(x) = 1.5x 3 2 L(0) = 2.33 2 1 1 L(0.5) = 0.58 0 x L(1) = 0 L(1.5) = 0.58 1 2 3 -0.5 0 0.5 1 1.5 w 1 N 1 N w = 1.5 ( fw(x n) − y n)2 = (wx n − y n)2 1 N 1 N 2N ∑ ∑ L(w) = ( f1.5(x n) − y n)2 = (1.5x n − y n)2 2N ∑ 2N ∑ L(1.5) = n=1 2N n=1 n=1 n=1 1 2 2 2 3.5 L(1.5) = (1.5 − 1) + (3 − 2) + (4.5 − 3) = ≈ 0.58 2*3 6 37 fi Simple Linear Regression - optimization L(−0.5) = 5.29 L(w) model (linear regression) function of w fw,b(x) = wx + b L(w) Parameters 3 w L(0) = 2.33 Loss function 2 1 N ( fw(x n) − y n)2 2N ∑ L(w) = L(0.5) = 0.58 1 n=1 L(1.5) = 0.58 L(1) = 0 Objective: w minimize L(w) -0.5 0 0.5 1 1.5 fw(x) = 1 ⋅ x -> L(1) Goal: minimize L(w) fw(x) = 1.5 ⋅ x -> L(1.5) …. in nite, How to e ectively nd the best w 38 fi ff fi Simple Linear Regression - optimization - A very bad idea solution: Random search fw(x) = 1 ⋅ x -> L(1) fw(x) = 1.5 ⋅ x -> L(1.5) - Better idea: follow the slope …. in nite, - Gradient descent L(w) 3 2 1 w Start with some w -0.5 0 0.5 1 1.5 Keep changing w to reduce L(w) -> follow the slope Until we settle at or mear a minimum Data Model Loss function Optimization algorithm 39 fi Simple linear regression - gradient descent algorithm - Consider general case: fw(x) = wx, and minimize L(w) (Randomly) Pick an initial value wt=0 t = 0 the initial time Direction of changes t = 1 the rst time of the weights update Compute its derivatives with respect t = 2 the second time of the weights update ∂L(w) to w: | w = wt=0 …. ∂w ∂L(w) Negative Increase w L(w) ∂w ∂L(w) Postive Decrease w ∂w ∂L(w) wt=1 = wt=0 − η | w = wt=0 derivative=0 ∂w Adapted source Assignment wt=0 wt=1 wt=2 ∂L(w) w Update w iteratively, until convergence −η | w = wt=0 ∂L(w) ∂w wt+1 = wt − η | w = wt ∂w Hyperparameter: Learning rate Data Model Loss function Optimization algorithm 40 fi Learning Rate - Hyperparameter - Choosing a good learning rate (step-size) is important in gradient descen - If the step-size is chosen too large, gradient descent can overshoot, to converge, or even diverge - If the learning rate is too small, gradient descent can be slow - Heuristically, we choose a learning rate that starts big and ends small too small, make little progress too large, might even diverge “just right” is tricky 41 Simple linear regression - gradient descent algorithm - Consider general case: fw,b(x) = wx + b, and minimize L(w, b) 1. Compute loss over the training dataset 1 N ( fw,b(x n) − y n)2 2N ∑ L(w, b) = n=1 2. Compute its derivatives with respect to w and b: Compute: ∂L/∂w, ∂L/∂b ∂ ∂ L(w, b), L(w, b) ∂w ∂b −η∂L/∂w, − η∂L/∂b 3. Simultaneous update your current values of w and b in the direction of the negative gradient , multiplied with a learning rate η ∂ temp_w = w − η L(w, b) ∂w ∂ temp_b = b − η L(w, b) ∂b w = temp_w w b = temp_b 4. Iterate steps 1 to 3 until convergence Data Model Loss function Optimization algorithm 42 Simple linear regression - gradient descent algorithm - Derivatives for linear regression Linear regression model: fw,b(x) = wx + b 1 N ( fw,b(x n) − y n)2 2N ∑ Loss function: L(w, b) = n=1 Gradient descent algorithm repeate until convergence { ∂ ∂ 1⋅2 N w=w−η L(w, b) (wx n + b − y n) ⋅ x n 2N ∑ ∂w L(w, b) = ∂w n=1 ∂ ∂ 1⋅2 N b = b − η L(w, b) (wx n + b − y n)) ⋅ 1 2N ∑ L(w, b) = ∂b ∂b } n=1 Data Model Loss function Optimization algorithm 43 Simple linear regression - gradient descent algorithm - Derivatives for linear regression One sample N samples 1 N L(w, b) = (wx + b − y)2 (wx n + b − y n)2 2N ∑ L(w, b) = n=1 h = wx + b − y ∂L ∂L ∂h ∂L 2 N = = 2hx = 2(wx + b − y)x (wx n + b − y n) ⋅ x n ∂w 2N ∑ = ∂w ∂h ∂w n=1 ∂L ∂L ∂h ∂L 2 N (wx n + b − y n)) ⋅ 1 2N ∑ = = 2h1 = 2(wx + b − y) = ∂b ∂h ∂b ∂b n=1 44 Reference - Stanford University, Machine Learning Specialization Course - National Taiwan University, Prof Hung-yi Lee, Machine Learning Course - INF2008 [2023/24 T2] Course, Prof Donny Soh 45

Introduction to Machine Learning Lecture 1 PDF

Document Details

Tags

Related

Summary

Full Transcript