Document Details

WellBalancedCombinatorics

Uploaded by WellBalancedCombinatorics

California Institute of Technology

2012

Yaser S. Abu-Mostafa

Tags

machine learning linear models learning from data algorithms

Summary

This document is a set of lecture slides for a class on machine learning, focusing on linear models. The slides use graphics and algorithms to illustrate the topic clearly.

Full Transcript

Review of Le ture 2 Sin e g has to be one of h 1 , h2 , · · · , hM , we on lude that Is Learning feasible? Yes, in a probabilisti Hi sense. If:...

Review of Le ture 2 Sin e g has to be one of h 1 , h2 , · · · , hM , we on lude that Is Learning feasible? Yes, in a probabilisti Hi sense. If: |E (g) − E (g)| > ǫ in out Eout(h) Then: |E (h1) − E (h1)| > ǫ or in out |E (h2) − E (h2)| > ǫ or in out ··· |E (hM ) − E (hM )| > ǫ in out Ein(h) Hi −2ǫ2N M P[ |E (h) − E (h)| > ǫ ] ≤ 2e in out This gives us an added fa tor. Learning From Data Yaser S. Abu-Mostafa California Institute of Te hnology Le ture 3: Linear Models I Sponsored by Calte h's Provost O e, E&AS Division, and IST Tuesday, April 10, 2012 Outline Input representation Linear Classi ation Linear Regression Nonlinear Transformation AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 2/23 A real data set AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 3/23 Input representation `raw' input x = (x ,x , x , · · · , x 0 1 2 256) linear model: (w , w , w , · · · , w 0 1 2 256) Features: Extra t useful information, e.g., intensity and symmetry x = (x ,x , x ) 0 1 2 linear model: (w , w , w ) 0 1 2 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 4/23 810 0.91 810 1214 0 1214 1618 0.20.1 Illustration of features 1618 x =02(x0,x0.3 0.4 1, x2) x1: intensity x2: 02symmetry 46 0.5 0.6 46 810 0.7 810 1214 0.80.9 1214 1618 -81 1618 510 -7-6 510 155 -5 155 1015 -4-3 1015 510 -2-1 510 15 0 15 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 5/23 PSfrag repla ements What PLA does Evolution of Ein and Eout Final per eptron boundary 50% E out 0.05 0.1 0.15 0.2 Symmetry repla ements 10% 0.25 0.3 0.35 0.4 -8 -7 -6 -5 1% -4 -3 E in -2 -1 0 250 500 750 1000 0 1 Average Intensity AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 6/23 The `po ket' algorithm PLA: Po ket: a ements PSfrag repla ements 50% Eout 50% 10% 10% Eout 1% 1% Ein Ein 0 250 500 750 1000 0 250 500 750 1000 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 7/23 repla ements PSfrag repla ements Classi ation boundary - PLA versus Po ket PLA: Po ket: 0.05 0.05 0.1 0.1 0.15 0.15 0.2 0.2 Symmetry Symmetry 0.25 0.25 0.3 0.3 0.35 0.35 0.4 0.4 -8 -8 -7 -7 -6 -6 -5 -5 -4 -4 -3 -3 -2 -2 -1 Average Intensity -1 0 1 Average Intensity 0 1 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 8/23 Outline Input representation Linear Classi ation Linear Regression regression ≡ real-valued output Nonlinear Transformation AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 9/23 Credit again Classi ation: Credit approval (yes/no) Regression: Credit line (dollar amount) age 23 years Input: x = annual salary years in residen e $30,000 1 year years in job 1 year urrent debt $15,000 ··· ··· d Linear regression output: X h(x) = wi xi = wTx i=0 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 10/23 The data set Credit o ers de ide on redit lines: (x1, y1), (x2, y2), · · · , (xN , yN ) yn ∈ R is the redit line for ustomer x. n Linear regression tries to repli ate that. AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 11/23 How to measure the error How well does h(x) = w x approximate f (x)? T In linear regression, we use squared error (h(x) − f (x)) 2 N 1 X in-sample error: Ein(h) = (h(xn) − yn) 2 N n=1 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 12/23 0.4 0.5 0.5 1 0.6 Illustration of linear0 regression 0.2 0.7 0.4 0.8 0.6 0.9 0.8 1 1 0 0 0.1 0.1 0.2 0.2 0.3 0.3 y y 0.4 0.4 0.5 0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.9 x 0.9 1 1 x2 1 x AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 13/23 The expression for E in N 1 X T 2 Ein(w) = ( w xn − yn) N n=1 1 2 = kXw − yk N x      T y1 x  1 where  T   ..  2   y2  X= , y=  x      T N yN AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 14/23 Minimizing E in 1 2 Ein(w) = N kXw − yk 2 T ∇Ein(w) = N X (Xw − y) = 0 XTXw = XTy w = X†y where X† = (XTX)−1XT X is the `pseudo-inverse' of X † AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 15/23 The pseudo-inverse X† = (XTX)−1XT  −1                      | {z| {z }}  | {z }  d+1× N × d+1 d+1  d+1 × N | {z } N × d+1 | {z } d+1 × N AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 16/23 The linear regression algorithm Constru t the matrix and the ve tor y from the data set X (x , y ), · · · , (x , y ) as follows 1: 1 1 N N  xT     y   1 1  T   . y = . .  x  2  y  2 X= ,  xT      N y N | {z } | {z } input data matrix target ve tor 2: Compute the pseudo-inverse X † = (XTX)−1XT. 3: Return w = X y. † AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 17/23 Linear regression for lassi ation Linear regression learns a real-valued fun tion y = f (x) ∈ R Binary-valued fun tions are also real-valued! ±1 ∈ R Use linear regression to get w where w x ≈ y = ±1 T n n In this ase, sign(w x ) is likely to agree with y = ±1 T n n Good initial weights for lassi ation AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 18/23 0.5 0.55 Linear regression boundary -8 -7 -6 -5 -4 -3 Symmetry -2 -1 0 Average Intensity AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 18/23 Outline Input representation Linear Classi ation Linear Regression Nonlinear Transformation AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 19/23 Linear is limited Data: Hypothesis: 1 1 Sfrag repla ements 0 PSfrag repla ements 0 −1 −1 −1 0 1 −1 0 1 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 20/23 Another example Credit line is ae ted by `years in residen e' but not in a linear way! Nonlinear [[xi < 1]] and [[xi > 5]] are better. Can we do that with linear models? AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 21/23 Linear in what? Linear regression implements d X wi xi i=0 Linear lassi ation implements d ! sign X wi xi i=0 Algorithms work be ause of linearity in the weights AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 22/23 Transform the data nonlinearly Φ (x1, x2) −→ (x , x ) 2 1 2 2 1 1 PSfrag repla ements 0 PSfrag repla ements 0.5 0 −1 0 0.5 1 −1 0 1 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 3 23/23

Use Quizgecko on...
Browser
Browser