Lecture Notes on Learning From Data PDF
Document Details
Uploaded by DeliciousLightYear
California Institute of Technology
Yaser S. Abu-Mostafa
Tags
Summary
These lecture notes provide a review of supervised learning focusing on the perceptron algorithm. It explores the concept of learning from data and introduces Hoeffding's inequality. The document illustrates the use of examples and diagrams.
Full Transcript
Review of Le ture 1 Example: Per eptron Learning Algorithm Learning is used when _ + - A pattern exists _ + - We annot pin it down mathemati ally...
Review of Le ture 1 Example: Per eptron Learning Algorithm Learning is used when _ + - A pattern exists _ + - We annot pin it down mathemati ally + _ + - We have data on it + _ Fo us on supervised learning - Unknown target fun tion y = f (x) Learning an unknown fun tion? - Impossible. The fun tion an assume - Data set (x1, y1), · · · , (xN , yN ) any value outside the data we have. - Learning algorithm pi ks g≈f from - So what now? a hypothesis set H Learning From Data Yaser S. Abu-Mostafa California Institute of Te hnology Le ture 2: Is Learning Feasible? Sponsored by Calte h's Provost O e, E&AS Division, and IST Thursday, April 5, 2012 Feasibility of learning - Outline Probability to the res ue Conne tion to learning Conne tion to real learning A dilemma and a solution AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 2/17 A related experiment top - Consider a `bin' with red and green marbles. BIN P[ pi king a red marble ]=µ SAMPLE P[ pi king a green marble ]=1−µ ν= fraction of red marbles - The value of µ is unknown to us. N µ - We pi k marbles independently. = probability of red marbles - The fra tion of red marbles in sample =ν bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 3/17 Does ν say anything about µ? No! top BIN Sample an be mostly green while bin is mostly red. SAMPLE Yes! ν= fraction of red marbles Sample frequen y ν is likely lose to bin frequen y µ. µ = probability possible versus probable of red marbles bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 4/17 What does ν say about µ? In a big sample (large N ), ν is probably lose to µ (within ǫ). Formally, PP [ |ν − µ| > ǫ ] ≤ 2e −2ǫ2N This is alled Hoeding's Inequality. In other words, the statement µ = ν is P.A.C. AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 5/17 PP [|ν − µ| > ǫ] ≤ 2e −2ǫ2N top Valid for all N and ǫ BIN SAMPLE Bound does not depend on µ ν= fraction of red marbles Tradeo: N , ǫ, and the bound. µ = probability ν ≈ µ =⇒ µ ≈ ν of red marbles bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 6/17 Conne tion to learning Hi h(x)= f (x) Bin: The unknown is a number µ h(x)= f (x) Learning: The unknown is a fun tion f : X →Y Ea h marble is a point x∈X : Hypothesis got it right h(x)=f (x) : Hypothesis got it wrong h(x)6=f (x) Hi AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 7/17 Ba k to the learning diagram UNKNOWN TARGET FUNCTION PROBABILITY The bin analogy: Hi f: X Y DISTRIBUTION P on X TRAINING EXAMPLES x1 ,... , xN ( x1 , y1 ),... , ( xN , yN ) X HYPOTHESIS SET LEARNING ALGORITHM A FINAL HYPOTHESIS g~ ~f Hi H AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 8/17 Are we done? Hi h(x)= f (x) Not so fast! h is xed. h(x)= f (x) For this h, ν generalizes to µ. `veri ation' of h, not learning No guarantee ν will be small. We need to hoose from multiple h's. Hi AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 9/17 Multiple bins Generalizing the bin model to more than one hypothesis: top h1 h2 hM µ1 µ2 µM........ ν1 ν2 νM bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 10/17 Notation for learning Hi Both µ and ν depend on whi h hypothesis h Eout(h) ν is ` in sample' denoted by Ein(h) µ is `out of sample' denoted by Eout(h) The Hoeding inequality be omes: −2ǫ2N P[ |Ein(h) − Eout(h)| > ǫ ] ≤ 2e Ein(h) Hi AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 11/17 Notation with multiple bins top h1 h2 hM Eout( h1) Eout( h2) Eout( hM )........ Ein( h1) Ein( h2) Ein( hM ) bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 12/17 Are we done already? Not so fast!! Hoeding doesn't apply to multiple bins. What? UNKNOWN TARGET FUNCTION PROBABILITY top f: X Y DISTRIBUTION top BIN P on X h1 h2 hM TRAINING EXAMPLES ( x1 , y1 ),... , ( xN , yN ) x1 ,... , xN Eout( h1) Eout( h2) Eout( hM ) SAMPLE −→ −→........ FINAL ν = fraction LEARNING ALGORITHM HYPOTHESIS of red marbles g~ ~f A Ein( h1) Ein( h2) Ein( hM ) HYPOTHESIS SET µ bottom = probability H of red marbles bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 13/17 Coin analogy Question: If you toss a fair oin 10 times, what is the probability that you will get 10 heads? Answer: ≈ 0.1% Question: If you toss 1000 fair oins 10 times ea h, what is the probability that some oin will get 10 heads? Answer: ≈ 63% AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 14/17 From oins to learning hi BINGO ? Hi AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 15/17 A simple solution P[ |Ein(g) − Eout(g)| > ǫ ] ≤ P[ |Ein(h1) − Eout(h1)| > ǫ or |Ein(h2) − Eout(h2)| > ǫ ··· or |Ein(hM ) − Eout(hM )| > ǫ ] X M ≤ P [|Ein(hm) − Eout(hm)| > ǫ] m=1 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 16/17 The nal verdi t X M P[ |Ein(g) − Eout(g)| > ǫ ] ≤ P [|Ein(hm) − Eout(hm)| > ǫ] m=1 XM −2ǫ2N ≤ 2e m=1 −2ǫ2N P[|Ein(g) − Eout(g)| > ǫ] ≤ 2M e AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 17/17