Introduction to Optimization Approximation and fitting PDF
Document Details
Uploaded by SoulfulConcreteArt
Stanford University
2004
Abdelwadood Mesleh
Tags
Summary
These notes cover introduction to optimization, approximation and fitting. They are based on Convex Optimization, Chapter 6 by Stephen Boyd and Lieven Vandenberghe, published in 2004 by Cambridge University Press.
Full Transcript
Introduction to Optimization Approximation and fitting Prof. Abdelwadood Mesleh This set of notes is mainly based on Convex Optimization at Stanford University: Chapter 6 Stephen Boyd and 1 Lieven Vandenberghe, Covex Optimization, Cambridge, U.K, Cambridg...
Introduction to Optimization Approximation and fitting Prof. Abdelwadood Mesleh This set of notes is mainly based on Convex Optimization at Stanford University: Chapter 6 Stephen Boyd and 1 Lieven Vandenberghe, Covex Optimization, Cambridge, U.K, Cambridge University Press 2004 6. Approximation and fitting norm approximation least-norm problems regularized approximation robust approximation Prof A MESLEH Introduction to Optimization ٢ Norm approximation minimize kAx − bk (A ∈ Rm×n with m ≥ n, k · k is a norm on Rm) interpretations of solution x⋆ = argminx kAx − bk: geometric: Ax⋆ is point in R(A) closest to b estimation: linear measurement model y = Ax + v y are measurements, x is unknown, v is measurement error given y = b, best guess of x is x⋆ optimal design: x are design variables (input), Ax is result (output) x⋆ is design that best approximates desired result b Prof A MESLEH Introduction to Optimization ٣ examples least-squares approximation (k · k2): solution satisfies normal equations AT Ax = AT b (x⋆ = (AT A)−1AT b if rank A = n) Chebyshev approximation (k · k∞): can be solved as an LP minimize t subject to −t1 Ax − b t1 sum of absolute residuals approximation (k · k1): can be solved as an LP minimize 1T y subject to −y Ax − b y Prof A MESLEH Introduction to Optimization ٤ Penalty function approximation minimize φ(r1) + · · · + φ(rm) subject to r = Ax − b (A ∈ Rm×n, φ : R → R is a convex penalty function) examples 2 2 quadratic: φ(u) = u log barrier 1.5 quadratic deadzone-linear with width a: φ(u) 1 deadzone-linear φ(u) = max{0, |u| − a} 0.5 log-barrier with limit a: 0 −1.5 −1 −0.5 0 0.5 1 1.5 u −a2 log(1 − (u/a)2) |u| < a φ(u) = ∞ otherwise Prof A MESLEH Introduction to Optimization ٥ example (m = 100, n = 30): histogram of residuals for penalties φ(u) = |u|, φ(u) = u2, φ(u) = max{0, |u|−a}, φ(u) = − log(1−u2) 40 p=1 0 −2 −1 0 1 2 10 p=2 0 −2 −1 0 1 2 Deadzone 20 0 −2 −1 0 1 2 Log barrier 10 0 −2 −1 0 1 2 r shape of penalty function has large effect on distribution of residuals Prof A MESLEH Introduction to Optimization ٦ Huber replacements penalty function (with parameter M ) u2 |u| ≤ M φhub(u) = M (2|u| − M ) |u| > M linear growth for large u makes approximation less sensitive to outliers 2 20 1.5 10 φhub(u) f (t) 1 0 0.5 −10 0 −20 −1.5 −1 −0.5 0 0.5 1 1.5 −10 −5 0 5 10 u t left: Huber penalty for M = 1 right: affine function f (t) = α + βt fitted to 42 points ti, yi (circles) using quadratic (dashed) and Huber (solid) penalty Prof A MESLEH Introduction to Optimization ٧ Least-norm problems minimize kxk subject to Ax = b (A ∈ Rm×n with m ≤ n, k · k is a norm on Rn) interpretations of solution x⋆ = argminAx=b kxk: geometric: x⋆ is point in affine set {x | Ax = b} with minimum distance to 0 estimation: b = Ax are (perfect) measurements of x; x⋆ is smallest (’most plausible’) estimate consistent with measurements design: x are design variables (inputs); b are required results (outputs) x⋆ is smallest (’most efficient’) design that satisfies requirements Prof A MESLEH Introduction to Optimization ٨ examples least-squares solution of linear equations (k · k2): can be solved via optimality conditions 2x + AT ν = 0, Ax = b minimum sum of absolute values (k · k1): can be solved as an LP minimize 1T y subject to −y x y, Ax = b tends to produce sparse solution x⋆ extension: least-penalty problem minimize φ(x1) + · · · + φ(xn) subject to Ax = b φ : R → R is convex penalty function Prof A MESLEH Introduction to Optimization ٩ Regularized approximation minimize (w.r.t. R2+) (kAx − bk, kxk) A ∈ Rm×n, norms on Rm and Rn can be different interpretation: find good approximation Ax ≈ b with small x estimation: linear measurement model y = Ax + v, with prior knowledge that kxk is small optimal design: small x is cheaper or more efficient, or the linear model y = Ax is only valid for small x robust approximation: good approximation Ax ≈ b with small x is less sensitive to errors in A than good approximation with large x Prof A MESLEH Introduction to Optimization ١٠ Scalarized problem minimize kAx − bk + γkxk solution for γ > 0 traces out optimal trade-off curve other common method: minimize kAx − bk2 + δkxk2 with δ > 0 Tikhonov regularization minimize kAx − bk22 + δkxk22 can be solved as a least-squares problem 2 minimize √A x− b δI 0 2 solution x⋆ = (AT A + δI)−1AT b Prof A MESLEH Introduction to Optimization ١١ Signal reconstruction minimize (w.r.t. R2+) (kx̂ − xcork2, φ(x̂)) x ∈ Rn is unknown signal xcor = x + v is (known) corrupted version of x, with additive noise v variable x̂ (reconstructed signal) is estimate of x φ : Rn → R is regularization function or smoothing objective examples: quadratic smoothing, total variation smoothing: n−1 X n−1 X φquad(x̂) = (x̂i+1 − x̂i)2, φtv (x̂) = |x̂i+1 − x̂i| i=1 i=1 Prof A MESLEH Introduction to Optimization ١٤ Robust approximation minimize kAx − bk with uncertain A two approaches: stochastic: assume A is random, minimize E kAx − bk worst-case: set A of possible values of A, minimize supA∈A kAx − bk tractable only in special cases (certain norms k · k, distributions, sets A) 12 example: A(u) = A0 + uA1 10 xnom xnom minimizes kA0x − bk22 8 xstoch minimizes E kA(u)x − bk22 r(u) 6 xstoch with u uniform on [−1, 1] xwc 4 xwc minimizes sup−1≤u≤1 kA(u)x − bk22 2 figure shows r(u) = kA(u)x − bk2 0 −2 −1 0 1 2 u Prof A MESLEH Introduction to Optimization ١٨ stochastic robust LS with A = Ā + U , U random, E U = 0, E U T U = P minimize E k(Ā + U )x − bk22 explicit expression for objective: E kAx − bk22 = E kĀx − b + U xk22 = kĀx − bk22 + E xT U T U x = kĀx − bk22 + xT P x hence, robust LS problem is equivalent to LS problem minimize kĀx − bk22 + kP 1/2xk22 for P = δI, get Tikhonov regularized problem minimize kĀx − bk22 + δkxk22 Prof A MESLEH Introduction to Optimization ١٩ worst-case robust LS with A = {Ā + u1A1 + · · · + upAp | kuk2 ≤ 1} minimize supA∈A kAx − bk22 = supkuk2≤1 kP (x)u + q(x)k22 where P (x) = A1x A2x · · · Apx , q(x) = Āx − b from page 5–14, strong duality holds between the following problems maximize kP u + qk22 minimize t+ λ subject to kuk22 ≤ 1 I P q subject to P T λI 0 0 qT 0 t hence, robust LS problem is equivalent to SDP minimize t+ λ I P (x) q(x) subject to P (x)T λI 0 0 q(x)T 0 t Prof A MESLEH Introduction to Optimization ٢٠ example: histogram of residuals r(u) = k(A0 + u1A1 + u2A2)x − bk2 with u uniformly distributed on unit disk, for three values of x 0.25 0.2 xrls frequency 0.15 0.1 xtik 0.05 xls 0 0 1 2 3 4 5 r(u) xls minimizes kA0x − bk2 xtik minimizes kA0x − bk22 + kxk22 (Tikhonov solution) xwc minimizes supkuk2≤1 kA0x − bk22 + kxk22 Prof A MESLEH Introduction to Optimization ٢١