Lecture Notes on Learning From Data PDF

Review of Le ture 1 Example: Per eptron Learning Algorithm Learning is used when _ + - A pattern exists _ + - We annot pin it down mathemati ally...

Review of Le ture 1 Example: Per eptron Learning Algorithm Learning is used when _ + - A pattern exists _ + - We annot pin it down mathemati ally + _ + - We have data on it + _ Fo us on supervised learning - Unknown target fun tion y = f (x) Learning an unknown fun tion? - Impossible. The fun tion an assume - Data set (x1, y1), · · · , (xN , yN ) any value outside the data we have. - Learning algorithm pi ks g≈f from - So what now? a hypothesis set H Learning From Data Yaser S. Abu-Mostafa California Institute of Te hnology Le ture 2: Is Learning Feasible? Sponsored by Calte h's Provost O e, E&AS Division, and IST Thursday, April 5, 2012 Feasibility of learning - Outline Probability to the res ue Conne tion to learning Conne tion to real learning A dilemma and a solution AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 2/17 A related experiment top - Consider a `bin' with red and green marbles. BIN P[ pi king a red marble ]=µ SAMPLE P[ pi king a green marble ]=1−µ ν= fraction of red marbles - The value of µ is unknown to us. N µ - We pi k marbles independently. = probability of red marbles - The fra tion of red marbles in sample =ν bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 3/17 Does ν say anything about µ? No! top BIN Sample an be mostly green while bin is mostly red. SAMPLE Yes! ν= fraction of red marbles Sample frequen y ν is likely lose to bin frequen y µ. µ = probability possible versus probable of red marbles bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 4/17 What does ν say about µ? In a big sample (large N ), ν is probably lose to µ (within ǫ). Formally, PP [ |ν − µ| > ǫ ] ≤ 2e −2ǫ2N This is alled Hoeding's Inequality. In other words, the statement µ = ν is P.A.C. AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 5/17 PP [|ν − µ| > ǫ] ≤ 2e −2ǫ2N top Valid for all N and ǫ BIN SAMPLE Bound does not depend on µ ν= fraction of red marbles Tradeo: N , ǫ, and the bound. µ = probability ν ≈ µ =⇒ µ ≈ ν of red marbles bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 6/17 Conne tion to learning Hi h(x)= f (x) Bin: The unknown is a number µ h(x)= f (x) Learning: The unknown is a fun tion f : X →Y Ea h marble is a point x∈X : Hypothesis got it right h(x)=f (x) : Hypothesis got it wrong h(x)6=f (x) Hi AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 7/17 Ba k to the learning diagram UNKNOWN TARGET FUNCTION PROBABILITY The bin analogy: Hi f: X Y DISTRIBUTION P on X TRAINING EXAMPLES x1 ,... , xN ( x1 , y1 ),... , ( xN , yN ) X HYPOTHESIS SET LEARNING ALGORITHM A FINAL HYPOTHESIS g~ ~f Hi H AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 8/17 Are we done? Hi h(x)= f (x) Not so fast! h is xed. h(x)= f (x) For this h, ν generalizes to µ. `veri ation' of h, not learning No guarantee ν will be small. We need to hoose from multiple h's. Hi AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 9/17 Multiple bins Generalizing the bin model to more than one hypothesis: top h1 h2 hM µ1 µ2 µM........ ν1 ν2 νM bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 10/17 Notation for learning Hi Both µ and ν depend on whi h hypothesis h Eout(h) ν is ` in sample' denoted by Ein(h) µ is `out of sample' denoted by Eout(h) The Hoeding inequality be omes: −2ǫ2N P[ |Ein(h) − Eout(h)| > ǫ ] ≤ 2e Ein(h) Hi AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 11/17 Notation with multiple bins top h1 h2 hM Eout( h1) Eout( h2) Eout( hM )........ Ein( h1) Ein( h2) Ein( hM ) bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 12/17 Are we done already? Not so fast!! Hoeding doesn't apply to multiple bins. What? UNKNOWN TARGET FUNCTION PROBABILITY top f: X Y DISTRIBUTION top BIN P on X h1 h2 hM TRAINING EXAMPLES ( x1 , y1 ),... , ( xN , yN ) x1 ,... , xN Eout( h1) Eout( h2) Eout( hM ) SAMPLE −→ −→........ FINAL ν = fraction LEARNING ALGORITHM HYPOTHESIS of red marbles g~ ~f A Ein( h1) Ein( h2) Ein( hM ) HYPOTHESIS SET µ bottom = probability H of red marbles bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 13/17 Coin analogy Question: If you toss a fair oin 10 times, what is the probability that you will get 10 heads? Answer: ≈ 0.1% Question: If you toss 1000 fair oins 10 times ea h, what is the probability that some oin will get 10 heads? Answer: ≈ 63% AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 14/17 From oins to learning hi BINGO ? Hi AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 15/17 A simple solution P[ |Ein(g) − Eout(g)| > ǫ ] ≤ P[ |Ein(h1) − Eout(h1)| > ǫ or |Ein(h2) − Eout(h2)| > ǫ ··· or |Ein(hM ) − Eout(hM )| > ǫ ] X M ≤ P [|Ein(hm) − Eout(hm)| > ǫ] m=1 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 16/17 The nal verdi t X M P[ |Ein(g) − Eout(g)| > ǫ ] ≤ P [|Ein(hm) − Eout(hm)| > ǫ] m=1 XM −2ǫ2N ≤ 2e m=1 −2ǫ2N P[|Ein(g) − Eout(g)| > ǫ] ≤ 2M e AM L Creator: Yaser Abu-Mostafa - LFD Le ture 2 17/17

Lecture Notes on Learning From Data PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue