Lecture Notes on VC Dimension PDF

Review of Le ture 6 The VC Inequality mH(N ) is polynomial Hoeffding Inequality Union Bound VC Bound if H has a break point k k space of...

Review of Le ture 6 The VC Inequality mH(N ) is polynomial Hoeffding Inequality Union Bound VC Bound if H has a break point k k space of data sets 1 2 3 4 5 6... 1 1 2 2 2 2 2.. D 2 1 3 4 4 4 4.. 3 1 4 7 top 8 8 8.. (a) (b) (c) N 4 1 5 11........ 5 1 6 :. 6 1 7 :. P [ |E e− 2 ǫ 2 N : : : :. in (g) − E (g)| > ǫ ] ≤ 2 out M bottom ↓ ↓ ↓ X k−1 N ↓ ↓ ↓ mH(N ) ≤ i 1 P [ |E − 8 ǫ2 N i=0 | {z } in (g) − E (g)| > ǫ ] ≤ 4 mH(2N ) out e maximum power is N k−1 Learning From Data Yaser S. Abu-Mostafa California Institute of Te hnology Le ture 7: The VC Dimension Sponsored by Calte h's Provost O e, E&AS Division, and IST Tuesday, April 24, 2012 Outline The denition VC dimension of per eptrons Interpreting the VC dimension Generalization bounds AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 2/24 Denition of VC dimension The VC dimension of a hypothesis set H, denoted by dv (H), is the largest value of N for whi h m H (N ) = 2N the most points H an shatter N ≤ dv (H) =⇒ H an shatter N points k > dv (H) =⇒ k is a break point for H AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 3/24 The growth fun tion In terms of a break point k: k−1 X N mH(N ) ≤ i=0 i In terms of the VC dimension dv : dv X N mH(N ) ≤ i |i=0 {z } maximum power is N dv AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 4/24 Examples H is positive rays: dv = 1 H is 2D per eptrons: dv = 3 H is onvex sets : up dv = ∞ bottom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 5/24 VC dimension and learning is nite will generalize up dv (H) =⇒ g∈H UNKNOWN TARGET FUNCTION PROBABILITY f: X Y DISTRIBUTION X Independent of the learning algorithm P on TRAINING EXAMPLES ( x1 , y1 ),... , ( xN , yN ) Independent of the input distribution LEARNING ALGORITHM A FINAL HYPOTHESIS g~ ~f Independent of the target fun tion HYPOTHESIS SET H down AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 6/24 VC dimension of per eptrons For d = 2, dv = 3 up In general, dv = d + 1 We will prove two dire tions: dv ≤ d + 1 dv ≥ d + 1 down AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 7/24 Here is one dire tion A set of N = d + 1 points in R shattered by the per eptron: d x     T 1 0 0... 0 x 1  T   1 1 0... 0  x  2     T    X =  = 1 0 1 0 . 3  ....... 0  x    T d+1 1 0... 0 1 X is invertible AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 8/24 Can we shatter this data set?     y1 ±1 For any an we nd a ve tor w satisfying      y2   ±1..  y = = ,     yd+1 ±1 sign(Xw) = y Easy! Just make sign(Xw)= y whi h means w = X−1y AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 9/24 We an shatter these d+1 points This implies what? [a℄ dv =d+1 [b℄ dv ≥d+1 X [ ℄ dv ≤d+1 [d℄ No on lusion AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 10/24 Now, to show that dv ≤ d + 1 We need to show that: [a℄ There are d + 1 points we annot shatter [b℄ There are d + 2 points we annot shatter [℄ We annot shatter any set of d + 1 points [d℄ We annot shatter any set of d + 2 points X AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 11/24 Take any d+2 points For any d + 2 points, x1, · · · , xd+1, xd+2 More points than dimensions =⇒ we must have X xj = ai xi i6=j where not all the a 's are zeros i AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 12/24 So? X xj = ai xi i6=j Consider the following di hotomy: x 's with non-zero a get y = sign(a ) i i i i and x gets y = −1 j j No per eptron an implement su h di hotomy! AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 13/24 Why? X X xj = ai xi =⇒ wTxj = ai wTxi i6=j i6=j If y = sign(w x ) = sign(a ), then a w x > 0 i T i i i T i This for es wx = X a wx > 0 T j i T i i6=j Therefore, yj = sign(wTxj ) = +1 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 14/24 Putting it together We proved dv ≤ d + 1 and dv ≥ d + 1 dv = d + 1 What is d + 1 in the per eptron? It is the number of parameters w , w , · · · , w 0 1 d AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 15/24 Outline The denition VC dimension of per eptrons Interpreting the VC dimension Generalization bounds AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 16/24 1. Degrees of freedom Parameters reate degrees of freedom # of parameters: analog degrees of freedom dv : equivalent `binary' degrees of freedom AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 17/24 -0.06 0.02 -0.04 0.04 The usual suspe ts -0.02 0.06( ): 0 Positive rays dv = 1 0.08 0.02 0.1 h(x) = −1 a h(x) = +1 0.04 x1 x2 x3... xN Positive intervals 0.06 (dv =2 ): 0.08 0.1 h(x) = −1 h(x) = +1 h(x) = −1 x1 x2 x3... xN AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 18/24 Not just parameters Parameters may not ontribute degrees of freedom: down x y down dv measures the ee tive number of parameters AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 19/24 2. Number of data points needed Two small quantities in the VC inequality: P [|E (g) − E (g)| > ǫ] ≤ 4m in out | H (2N {z)e − 81 ǫ2N } δ If we want ertain ǫ and δ, how does N depend on dv ? Let us look at N de−N AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 20/24 N de−N Fix N e = small value d −N 10 10 N 30e−N How does N hange with d? 10 5 0 10 Rule of thumb: −5 10 N ≥ 10 dv 20 40 60 80 100 120 140 160 180 200 AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 21/24 Outline The denition VC dimension of per eptrons Interpreting the VC dimension Generalization bounds AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 22/24 Rearranging things Start from the VC inequality: P [|E − E | > ǫ] ≤ |4mH(2N out in {z)e − 81 ǫ2N } δ Get ǫ in terms of δ: r − 18 ǫ2N 8 4mH(2N ) δ = 4mH(2N )e =⇒ ǫ = ln | N {z δ } Ω With probability ≥ 1 − δ, |E out − E | ≤ Ω(N, H, δ) in AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 23/24 Generalization bound With probability ≥ 1 − δ, |E out − E | ≤ Ω(N, H, δ) in =⇒ With probability ≥ 1 − δ, E out ≤ E in + Ω AM L Creator: Yaser Abu-Mostafa - LFD Le ture 7 24/24

Lecture Notes on VC Dimension PDF

Document Details

Tags

Related

Summary

Full Transcript