Introduction to Optimization Approximation and fitting PDF

Introduction to Optimization Approximation and ﬁtting Prof. Abdelwadood Mesleh This set of notes is mainly based on Convex Optimization at Stanford University: Chapter 6 Stephen Boyd and 1 Lieven Vandenberghe, Covex Optimization, Cambridge, U.K, Cambridge University Press 2004 6. Approximation and fitting norm approximation least-norm problems regularized approximation robust approximation Prof A MESLEH Introduction to Optimization ٢ Norm approximation minimize kAx − bk (A ∈ Rm×n with m ≥ n, k · k is a norm on Rm) interpretations of solution x⋆ = argminx kAx − bk: geometric: Ax⋆ is point in R(A) closest to b estimation: linear measurement model y = Ax + v y are measurements, x is unknown, v is measurement error given y = b, best guess of x is x⋆ optimal design: x are design variables (input), Ax is result (output) x⋆ is design that best approximates desired result b Prof A MESLEH Introduction to Optimization ٣ examples least-squares approximation (k · k2): solution satisfies normal equations AT Ax = AT b (x⋆ = (AT A)−1AT b if rank A = n) Chebyshev approximation (k · k∞): can be solved as an LP minimize t subject to −t1 Ax − b t1 sum of absolute residuals approximation (k · k1): can be solved as an LP minimize 1T y subject to −y Ax − b y Prof A MESLEH Introduction to Optimization ٤ Penalty function approximation minimize φ(r1) + · · · + φ(rm) subject to r = Ax − b (A ∈ Rm×n, φ : R → R is a convex penalty function) examples 2 2 quadratic: φ(u) = u log barrier 1.5 quadratic deadzone-linear with width a: φ(u) 1 deadzone-linear φ(u) = max{0, |u| − a} 0.5 log-barrier with limit a: 0 −1.5 −1 −0.5 0 0.5 1 1.5 u −a2 log(1 − (u/a)2) |u| < a φ(u) = ∞ otherwise Prof A MESLEH Introduction to Optimization ٥ example (m = 100, n = 30): histogram of residuals for penalties φ(u) = |u|, φ(u) = u2, φ(u) = max{0, |u|−a}, φ(u) = − log(1−u2) 40 p=1 0 −2 −1 0 1 2 10 p=2 0 −2 −1 0 1 2 Deadzone 20 0 −2 −1 0 1 2 Log barrier 10 0 −2 −1 0 1 2 r shape of penalty function has large effect on distribution of residuals Prof A MESLEH Introduction to Optimization ٦ Huber replacements penalty function (with parameter M ) u2 |u| ≤ M φhub(u) = M (2|u| − M ) |u| > M linear growth for large u makes approximation less sensitive to outliers 2 20 1.5 10 φhub(u) f (t) 1 0 0.5 −10 0 −20 −1.5 −1 −0.5 0 0.5 1 1.5 −10 −5 0 5 10 u t left: Huber penalty for M = 1 right: affine function f (t) = α + βt fitted to 42 points ti, yi (circles) using quadratic (dashed) and Huber (solid) penalty Prof A MESLEH Introduction to Optimization ٧ Least-norm problems minimize kxk subject to Ax = b (A ∈ Rm×n with m ≤ n, k · k is a norm on Rn) interpretations of solution x⋆ = argminAx=b kxk: geometric: x⋆ is point in affine set {x | Ax = b} with minimum distance to 0 estimation: b = Ax are (perfect) measurements of x; x⋆ is smallest (’most plausible’) estimate consistent with measurements design: x are design variables (inputs); b are required results (outputs) x⋆ is smallest (’most efficient’) design that satisfies requirements Prof A MESLEH Introduction to Optimization ٨ examples least-squares solution of linear equations (k · k2): can be solved via optimality conditions 2x + AT ν = 0, Ax = b minimum sum of absolute values (k · k1): can be solved as an LP minimize 1T y subject to −y x y, Ax = b tends to produce sparse solution x⋆ extension: least-penalty problem minimize φ(x1) + · · · + φ(xn) subject to Ax = b φ : R → R is convex penalty function Prof A MESLEH Introduction to Optimization ٩ Regularized approximation minimize (w.r.t. R2+) (kAx − bk, kxk) A ∈ Rm×n, norms on Rm and Rn can be different interpretation: find good approximation Ax ≈ b with small x estimation: linear measurement model y = Ax + v, with prior knowledge that kxk is small optimal design: small x is cheaper or more efficient, or the linear model y = Ax is only valid for small x robust approximation: good approximation Ax ≈ b with small x is less sensitive to errors in A than good approximation with large x Prof A MESLEH Introduction to Optimization ١٠ Scalarized problem minimize kAx − bk + γkxk solution for γ > 0 traces out optimal trade-off curve other common method: minimize kAx − bk2 + δkxk2 with δ > 0 Tikhonov regularization minimize kAx − bk22 + δkxk22 can be solved as a least-squares problem 2 minimize √A x− b δI 0 2 solution x⋆ = (AT A + δI)−1AT b Prof A MESLEH Introduction to Optimization ١١ Signal reconstruction minimize (w.r.t. R2+) (kx̂ − xcork2, φ(x̂)) x ∈ Rn is unknown signal xcor = x + v is (known) corrupted version of x, with additive noise v variable x̂ (reconstructed signal) is estimate of x φ : Rn → R is regularization function or smoothing objective examples: quadratic smoothing, total variation smoothing: n−1 X n−1 X φquad(x̂) = (x̂i+1 − x̂i)2, φtv (x̂) = |x̂i+1 − x̂i| i=1 i=1 Prof A MESLEH Introduction to Optimization ١٤ Robust approximation minimize kAx − bk with uncertain A two approaches: stochastic: assume A is random, minimize E kAx − bk worst-case: set A of possible values of A, minimize supA∈A kAx − bk tractable only in special cases (certain norms k · k, distributions, sets A) 12 example: A(u) = A0 + uA1 10 xnom xnom minimizes kA0x − bk22 8 xstoch minimizes E kA(u)x − bk22 r(u) 6 xstoch with u uniform on [−1, 1] xwc 4 xwc minimizes sup−1≤u≤1 kA(u)x − bk22 2 figure shows r(u) = kA(u)x − bk2 0 −2 −1 0 1 2 u Prof A MESLEH Introduction to Optimization ١٨ stochastic robust LS with A = Ā + U , U random, E U = 0, E U T U = P minimize E k(Ā + U )x − bk22 explicit expression for objective: E kAx − bk22 = E kĀx − b + U xk22 = kĀx − bk22 + E xT U T U x = kĀx − bk22 + xT P x hence, robust LS problem is equivalent to LS problem minimize kĀx − bk22 + kP 1/2xk22 for P = δI, get Tikhonov regularized problem minimize kĀx − bk22 + δkxk22 Prof A MESLEH Introduction to Optimization ١٩ worst-case robust LS with A = {Ā + u1A1 + · · · + upAp | kuk2 ≤ 1} minimize supA∈A kAx − bk22 = supkuk2≤1 kP (x)u + q(x)k22 where P (x) = A1x A2x · · · Apx , q(x) = Āx − b from page 5–14, strong duality holds between the following problems maximize kP u + qk22 minimize t+ λ subject to kuk22 ≤ 1  I P q subject to  P T λI 0 0 qT 0 t hence, robust LS problem is equivalent to SDP minimize t+ λ  I P (x) q(x) subject to  P (x)T λI 0 0 q(x)T 0 t Prof A MESLEH Introduction to Optimization ٢٠ example: histogram of residuals r(u) = k(A0 + u1A1 + u2A2)x − bk2 with u uniformly distributed on unit disk, for three values of x 0.25 0.2 xrls frequency 0.15 0.1 xtik 0.05 xls 0 0 1 2 3 4 5 r(u) xls minimizes kA0x − bk2 xtik minimizes kA0x − bk22 + kxk22 (Tikhonov solution) xwc minimizes supkuk2≤1 kA0x − bk22 + kxk22 Prof A MESLEH Introduction to Optimization ٢١

Introduction to Optimization Approximation and fitting PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue