Lecture 5: Training Versus Testing PDF

Review of Le ture 4 Noisy targets y = f (x) −→ y ∼ P (y | x) Error measures - User-spe ied e h(x), f (x) UNKNOWN TARGET DISTRIBUTION...

Review of Le ture 4 Noisy targets y = f (x) −→ y ∼ P (y | x) Error measures - User-spe ied e h(x), f (x) UNKNOWN TARGET DISTRIBUTION P(y | x) target function f: X Y plus noise f −1 intruder > : UNKNOWN TRAINING EXAMPLES INPUT ( x1 , y1 ),... , ( xN , yN ) DISTRIBUTION - In-sample: N P (x) 1 X e h(xn), f (xn) E (h) = in N n=1 - (x1, y1), · · · , (xN , yN ) generated by P (x, y) = P (x)P (y|x) - Out-of-sample E (h) = Ex e h(x), f (x) - E (h) is now Ex,y e h(x), y out out Learning From Data Yaser S. Abu-Mostafa California Institute of Te hnology Le ture 5: Training versus Testing Sponsored by Calte h's Provost O e, E&AS Division, and IST Tuesday, April 17, 2012 Outline From training to testing Illustrative examples Key notion: break point Puzzle AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 2/20 The nal exam Testing: P [ |E − E | > ǫ ] ≤ 2 e in out −2ǫ2N Training: P [ |E − E | > ǫ ] ≤ 2M e in out −2ǫ2N AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 3/20 Where did the M ome from? The B ad events Bm are |E (hm) − E (hm)| > ǫ in out B1 B2 The union bound: P[B1 or B2 or · · · or BM ] ≤P [B 1 ] + P[B2] + · · · + P[BM ] B3 M terms | {z } no overlaps: AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 4/20 Can we improve on M? up Yes, bad events are very overlapping! ∆E out : hange in +1 and −1 areas 1 ∆E in : hange in labels of data points +1 |E (h1) − E (h1)| ≈ |E (h2) − E (h2)| in out in out down AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 5/20 What an we repla e M with? Instead of the whole input spa e, we onsider a nite set of input points, and ount the number of di hotomies AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 6/20 Di hotomies: mini-hypotheses A hypothesis h : X → {−1, +1} A di hotomy h : {x1, x2, · · · , xN } → {−1, +1} Number of hypotheses |H| an be innite Number of di hotomies |H(x1, x2, · · · , xN )| is at most 2N Candidate for repla ing M AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 7/20 The growth fun tion The growth fun tion ounts the most di hotomies on any N points mH(N )= max |H(x1, · · · , xN )| x1,··· ,xN ∈X The growth fun tion satises: mH(N ) ≤ 2N Let's apply the denition. AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 8/20 ements PSfrag repla ements Applying denition - per eptrons 0 0 0.5 1 0.5 1 mH(N ) 1.5 1.5 PSfrag repla ements 2 2 2.5 2.5 0 3 3 0.5 3.5 3.5 1 4 4 1.5 1 1 2 1.2 1.2 2.5 1.4 1.4 3 1.6 1.6 3.5 1.8 1.8 4 2 2 0.5 2.2 2.2 1 2.4 2.4 1.5 2.6 2.6 2 2.8 2.8 2.5 3 3 3 N =3 N =3 N =4 mH(3) = 8 mH(4) = 14 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 9/20 Outline From training to testing Illustrative examples Key notion: break point Puzzle AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 10/20 0.04 Example 1: positive rays 0.06 0.08 h(x) = −1 h(x) = +1 a 0.1 x1 x2 x3... xN H is set of h : R → {−1, +1} h(x) = sign(x − a) mH(N ) = N + 1 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 11/20 0.06 Example 2: positive intervals 0.08 h(x) = −1 h(x) = +1 h(x) = −1 0.1 x1 x2 x3... xN H is set of h : R → {−1, +1} Pla e interval ends in two of N +1 spots N +1 mH(N ) = 2 +1 = 12 N 2 + 12 N + 1 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 12/20 Example 3: onvex sets 2 up + H is set of h : R → {−1, +1} − + h(x) = +1 is onvex + mH(N ) = 2N − − The N points are `shattered' by onvex sets − + − + bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 13/20 The 3 growth fun tions H is positive rays: mH(N ) = N + 1 H is positive intervals: mH(N ) = 12 N 2 + 12 N + 1 H is onvex sets: mH(N ) = 2N AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 14/20 Ba k to the big pi ture Remember this inequality? P [ |E − E | > ǫ ] ≤ 2M e in out −2ǫ2N What happens if mH(N ) repla es M? mH(N ) polynomial =⇒ Good! Just prove that mH(N ) is polynomial? AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 15/20 Outline From training to testing Illustrative examples Key notion: break point Puzzle AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 16/20 1.5 2 Break point of H 2.5 3 Denition: 3.5 4 If no data set of size k an be shattered by H, 0.5 then k is a break point for H 1 1.5 mH(k) < 2k 2 2.5 For 2D per eptrons, k=4 3 A bigger data set annot be shattered either AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 17/20 Break point - the 3 examples Positive rays mH(N ) = N + 1 break point k= 2 Positive intervals mH(N ) = 12 N 2 + 12 N + 1 break point k= 3 Convex sets mH(N ) = 2N break point k =`∞' AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 18/20 Main result No break point =⇒ mH(N ) = 2N Any break point =⇒ mH(N ) is polynomial in N AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 19/20 Puzzle x1 x2 x3 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ AM L Creator: Yaser Abu-Mostafa - LFD Le ture 5 20/20

Lecture 5: Training Versus Testing PDF

Document Details

Tags

Related

Summary

Full Transcript