Introduction to Optimization 03-Convex Functions PDF

Introduction to Optimization 03-Convex Functions Prof. Abdelwadood Mesleh This set of notes is mainly based on Convex Optimization 1 at Stanford University: Chapter 3 of Stephen Boyd and Lieven Vandenberghe, Covex Optimization, Cambridge, U.K, Cambridge University Press 2004 Convex functions basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions convexity with respect to generalized inequalities Prof. A MESLEH Introduction to Optimization ٢ Definition f : Rn → R is convex if dom f is a convex set and f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) for all x, y ∈ dom f , 0 ≤ θ ≤ 1 (y, f (y)) (x, f (x)) f is concave if −f is convex f is strictly convex if dom f is convex and f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y) for x, y ∈ dom f , x 6= y, 0 < θ < 1 Prof. A MESLEH Introduction to Optimization ٣ Examples on R convex: affine: ax + b on R, for any a, b ∈ R exponential: eax, for any a ∈ R powers: xα on R++, for α ≥ 1 or α ≤ 0 powers of absolute value: |x|p on R, for p ≥ 1 negative entropy: x log x on R++ concave: affine: ax + b on R, for any a, b ∈ R powers: xα on R++, for 0 ≤ α ≤ 1 logarithm: log x on R++ Prof. A MESLEH Introduction to Optimization ٤ Examples on Rn and Rm×n affine functions are convex and concave; all norms are convex examples on Rn affine function f (x) = aT x + b Pn norms: kxkp = ( i=1 |xi|p)1/p for p ≥ 1; kxk∞ = maxk |xk | examples on Rm×n (m × n matrices) affine function m X X n f (X) = tr(AT X) + b = Aij Xij + b i=1 j=1 spectral (maximum singular value) norm f (X) = kXk2 = σmax(X) = (λmax(X T X))1/2 Prof. A MESLEH Introduction to Optimization ٥ Restriction of a convex function to a line f : Rn → R is convex if and only if the function g : R → R, g(t) = f (x + tv), dom g = {t | x + tv ∈ dom f } is convex (in t) for any x ∈ dom f , v ∈ Rn can check convexity of f by checking convexity of functions of one variable example. f : Sn → R with f (X) = log det X, dom X = Sn++ g(t) = log det(X + tV ) = log det X + log det(I + tX −1/2V X −1/2) Xn = log det X + log(1 + tλi) i=1 where λi are the eigenvalues of X −1/2V X −1/2 g is concave in t (for any choice of X ≻ 0, V ); hence f is concave Prof. A MESLEH Introduction to Optimization ٦ Extended-value extension extended-value extension f˜ of f is f˜(x) = f (x), x ∈ dom f, f˜(x) = ∞, x 6∈ dom f often simplifies notation; for example, the condition 0≤θ≤1 =⇒ f˜(θx + (1 − θ)y) ≤ θf˜(x) + (1 − θ)f˜(y) (as an inequality in R ∪ {∞}), means the same as the two conditions dom f is convex for x, y ∈ dom f , 0≤θ≤1 =⇒ f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) Prof. A MESLEH Introduction to Optimization ٧ First-order condition f is differentiable if dom f is open and the gradient ∂f (x) ∂f (x) ∂f (x) ∇f (x) = , ,..., ∂x1 ∂x2 ∂xn exists at each x ∈ dom f 1st-order condition: differentiable f with convex domain is convex iff f (y) ≥ f (x) + ∇f (x)T (y − x) for all x, y ∈ dom f f (y) f (x) + ∇f (x)T (y − x) (x, f (x)) first-order approximation of f is global underestimator Prof. A MESLEH Introduction to Optimization ٨ Second-order conditions f is twice differentiable if dom f is open and the Hessian ∇2f (x) ∈ Sn 2 ∂ 2f (x) ∇ f (x)ij = , i, j = 1,... , n, ∂xi∂xj exists at each x ∈ dom f 2nd-order conditions: for twice differentiable f with convex domain f is convex if and only if ∇2f (x) 0 for all x ∈ dom f if ∇2f (x) ≻ 0 for all x ∈ dom f , then f is strictly convex Prof. A MESLEH Introduction to Optimization ٩ Examples quadratic function: f (x) = (1/2)xT P x + q T x + r (with P ∈ Sn) ∇f (x) = P x + q, ∇2f (x) = P convex if P 0 least-squares objective: f (x) = kAx − bk22 ∇f (x) = 2AT (Ax − b), ∇2f (x) = 2AT A convex (for any A) Prof. A MESLEH Introduction to Optimization ١٠ Pn log-sum-exp: f (x) = log k=1 exp xk is convex 1 1 ∇2f (x) = diag(z) − zz T (zk = exp xk ) 1T z (1T z)2 to show ∇2f (x) 0, we must verify that v T ∇2f (x)v ≥ 0 for all v: ( k zk vk )( k zk ) − ( k vk zk )2 2 P P P T 2 v ∇ f (x)v = ≥0 ( k zk )2 P 2 2 P P P since ( k vk zk ) ≤ ( k zk vk )( k zk ) (from Cauchy-Schwarz inequality) Prof. A MESLEH Introduction to Optimization ١١ Epigraph and sublevel set α-sublevel set of f : Rn → R: Cα = {x ∈ dom f | f (x) ≤ α} sublevel sets of convex functions are convex (converse is false) epigraph of f : Rn → R: epi f = {(x, t) ∈ Rn+1 | x ∈ dom f, f (x) ≤ t} epi f f f is convex if and only if epi f is a convex set Prof. A MESLEH Introduction to Optimization ١٢ Jensen’s inequality basic inequality: if f is convex, then for 0 ≤ θ ≤ 1, f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) extension: if f is convex, then f (E z) ≤ E f (z) for any random variable z basic inequality is special case with discrete distribution prob(z = x) = θ, prob(z = y) = 1 − θ Prof. A MESLEH Introduction to Optimization ١٣ Operations that preserve convexity practical methods for establishing convexity of a function 1. verify definition (often simplified by restricting to a line) 2. for twice differentiable functions, show ∇2f (x) 0 3. show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective Prof. A MESLEH Introduction to Optimization ١٤ Positive weighted sum & composition with aﬃne function nonnegative multiple: αf is convex if f is convex, α ≥ 0 (sum: f1 + f2 convex if f1, f2 convex (extends to infinite sums, integrals composition with affine function: f (Ax + b) is convex if f is convex examples log barrier for linear inequalities m X f (x) = − log(bi − aTi x), dom f = {x | aTi x < bi, i = 1,... , m} i=1 (any) norm of affine function: f (x) = kAx + bk Prof. A MESLEH Introduction to Optimization ١٥ Pointwise maximum if f1,... , fm are convex, then f (x) = max{f1(x),... , fm(x)} is convex examples piecewise-linear function: f (x) = maxi=1,...,m(aTi x + bi) is convex sum of r largest components of x ∈ Rn: f (x) = x + x + · · · + x[r] is convex (x[i] is ith largest component of x) proof: f (x) = max{xi1 + xi2 + · · · + xir | 1 ≤ i1 < i2 < · · · < ir ≤ n} Prof. A MESLEH Introduction to Optimization ١٦ Pointwise supremum if f (x, y) is convex in x for each y ∈ A, then g(x) = sup f (x, y) y∈A is convex examples support function of a set C: SC (x) = supy∈C y T x is convex distance to farthest point in a set C: f (x) = sup kx − yk y∈C maximum eigenvalue of symmetric matrix: for X ∈ Sn, λmax(X) = sup y T Xy kyk2 =1 Prof. A MESLEH Introduction to Optimization ١٧ Composition with scalar functions composition of g : Rn → R and h : R → R: f (x) = h(g(x)) g convex, h convex, h̃ nondecreasing f is convex if g concave, h convex, h̃ nonincreasing proof (for n = 1, differentiable g, h) f ′′(x) = h′′(g(x))g ′(x)2 + h′(g(x))g ′′(x) note: monotonicity must hold for extended-value extension h̃ examples exp g(x) is convex if g is convex 1/g(x) is convex if g is concave and positive Prof. A MESLEH Introduction to Optimization ١٨ Vector composition composition of g : Rn → Rk and h : Rk → R: f (x) = h(g(x)) = h(g1(x), g2(x),... , gk (x)) gi convex, h convex, h̃ nondecreasing in each argument f is convex if gi concave, h convex, h̃ nonincreasing in each argument proof (for n = 1, differentiable g, h) f ′′(x) = g ′(x)T ∇2h(g(x))g ′(x) + ∇h(g(x))T g ′′(x) examples Pm i=1 log gi (x) is concave if gi are concave and positive Pm log i=1 exp gi(x) is convex if gi are convex Prof. A MESLEH Introduction to Optimization ١٩ Minimization if f (x, y) is convex in (x, y) and C is a convex set, then g(x) = inf f (x, y) y∈C is convex examples f (x, y) = xT Ax + 2xT By + y T Cy with A B 0, C≻0 BT C minimizing over y gives g(x) = inf y f (x, y) = xT (A − BC −1B T )x g is convex, hence Schur complement A − BC −1B T 0 distance to a set: dist(x, S) = inf y∈S kx − yk is convex if S is convex Prof. A MESLEH Introduction to Optimization ٢٠ Perspective the perspective of a function f : Rn → R is the function g : Rn × R → R, g(x, t) = tf (x/t), dom g = {(x, t) | x/t ∈ dom f, t > 0} g is convex if f is convex examples f (x) = xT x is convex; hence g(x, t) = xT x/t is convex for t > 0 negative logarithm f (x) = − log x is convex; hence relative entropy g(x, t) = t log t − t log x is convex on R2++ if f is convex, then T T g(x) = (c x + d)f (Ax + b)/(c x + d) is convex on {x | cT x + d > 0, (Ax + b)/(cT x + d) ∈ dom f } Prof. A MESLEH Introduction to Optimization ٢١ The conjugate function the conjugate of a function f is f ∗(y) = sup (y T x − f (x)) x∈dom f f (x) xy x (0, −f ∗(y)) f ∗ is convex (even if f is not) will be useful in chapter 5 in Duality Prof. A MESLEH Introduction to Optimization ٢٢ Examples negative logarithm f (x) = − log x f ∗(y) = sup(xy + log x) x>0 −1 − log(−y) y < 0 = ∞ otherwise strictly convex quadratic f (x) = (1/2)xT Qx with Q ∈ Sn++ f ∗(y) = sup(y T x − (1/2)xT Qx) x 1 T −1 = y Q y 2 Prof. A MESLEH Introduction to Optimization ٢٣ Quasiconvex functions f : Rn → R is quasiconvex if dom f is convex and the sublevel sets Sα = {x ∈ dom f | f (x) ≤ α} are convex for all α β α a b c f is quasiconcave if −f is quasiconvex f is quasilinear if it is quasiconvex and quasiconcave Prof. A MESLEH Introduction to Optimization ٢٤ Examples p |x| is quasiconvex on R ceil(x) = inf{z ∈ Z | z ≥ x} is quasilinear log x is quasilinear on R++ f (x1, x2) = x1x2 is quasiconcave on R2++ linear-fractional function aT x + b f (x) = T , dom f = {x | cT x + d > 0} c x+d is quasilinear distance ratio kx − ak2 f (x) = , dom f = {x | kx − ak2 ≤ kx − bk2} kx − bk2 is quasiconvex Prof. A MESLEH Introduction to Optimization ٢٥ Properties modified Jensen inequality: for quasiconvex f 0≤θ≤1 =⇒ f (θx + (1 − θ)y) ≤ max{f (x), f (y)} first-order condition: differentiable f with cvx domain is quasiconvex iff f (y) ≤ f (x) =⇒ ∇f (x)T (y − x) ≤ 0 ∇f (x) x sums of quasiconvex functions are not necessarily quasiconvex Prof. A MESLEH Introduction to Optimization ٢٧ Log-concave and log-convex functions a positive function f is log-concave if log f is concave: f (θx + (1 − θ)y) ≥ f (x)θ f (y)1−θ for 0 ≤ θ ≤ 1 f is log-convex if log f is convex powers: xa on R++ is log-convex for a ≤ 0, log-concave for a ≥ 0 many common probability densities are log-concave, e.g., normal: 1 1 T Σ−1 (x−x̄) f (x) = p e− 2 (x−x̄) (2π)n det Σ cumulative Gaussian distribution function Φ is log-concave Z x 1 2 Φ(x) = √ e−u /2 du 2π −∞ Prof. A MESLEH Introduction to Optimization ٢٨ Properties of log-concave functions twice differentiable f with convex domain is log-concave if and only if f (x)∇2f (x) ∇f (x)∇f (x)T for all x ∈ dom f product of log-concave functions is log-concave sum of log-concave functions is not always log-concave integration: if f : Rn × Rm → R is log-concave, then Z g(x) = f (x, y) dy is log-concave (not easy to show) Prof. A MESLEH Introduction to Optimization ٢٩ Consequences of integration property convolution f ∗ g of log-concave functions f , g is log-concave Z (f ∗ g)(x) = f (x − y)g(y)dy if C ⊆ Rn convex and y is a random variable with log-concave pdf then f (x) = prob(x + y ∈ C) is log-concave proof: write f (x) as integral of product of log-concave functions Z 1 u∈C f (x) = g(x + y)p(y) dy, g(u) = 0 u∈ 6 C, p is pdf of y Prof. A MESLEH Introduction to Optimization ٣٠ Convexity with respect to generalized inequalities f : Rn → Rm is K-convex if dom f is convex and f (θx + (1 − θ)y) K θf (x) + (1 − θ)f (y) for x, y ∈ dom f , 0 ≤ θ ≤ 1 example f : Sm → Sm, f (X) = X 2 is Sm + -convex proof: for fixed z ∈ Rm, z T X 2z = kXzk22 is convex in X, i.e., z T (θX + (1 − θ)Y )2z ≤ θz T X 2z + (1 − θ)z T Y 2z for X, Y ∈ Sm, 0 ≤ θ ≤ 1 therefore (θX + (1 − θ)Y )2 θX 2 + (1 − θ)Y 2 Prof. A MESLEH Introduction to Optimization ٣٢

Introduction to Optimization 03-Convex Functions PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue