Introduction to Optimization 03-Convex Functions PDF
Document Details
Uploaded by ImpressedSugilite819
New York University Abu Dhabi
Abdelwadood Mesleh
Tags
Summary
These lecture notes cover convex functions and their properties. It is based on Convex Optimization 1 at Stanford University, and details the properties and examples of convex and concave functions. The notes also include discussion of quasiconvex functions, log-concave, and log-convex functions.
Full Transcript
Introduction to Optimization 03-Convex Functions Prof. Abdelwadood Mesleh This set of notes is mainly based on Convex Optimization 1 at Stanford University: Chapter 3 of Stephen Boyd and Lieven Vandenberghe, Covex Optimization, Cambridge, U.K,...
Introduction to Optimization 03-Convex Functions Prof. Abdelwadood Mesleh This set of notes is mainly based on Convex Optimization 1 at Stanford University: Chapter 3 of Stephen Boyd and Lieven Vandenberghe, Covex Optimization, Cambridge, U.K, Cambridge University Press 2004 Convex functions basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions convexity with respect to generalized inequalities Prof. A MESLEH Introduction to Optimization ٢ Definition f : Rn → R is convex if dom f is a convex set and f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) for all x, y ∈ dom f , 0 ≤ θ ≤ 1 (y, f (y)) (x, f (x)) f is concave if −f is convex f is strictly convex if dom f is convex and f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y) for x, y ∈ dom f , x 6= y, 0 < θ < 1 Prof. A MESLEH Introduction to Optimization ٣ Examples on R convex: affine: ax + b on R, for any a, b ∈ R exponential: eax, for any a ∈ R powers: xα on R++, for α ≥ 1 or α ≤ 0 powers of absolute value: |x|p on R, for p ≥ 1 negative entropy: x log x on R++ concave: affine: ax + b on R, for any a, b ∈ R powers: xα on R++, for 0 ≤ α ≤ 1 logarithm: log x on R++ Prof. A MESLEH Introduction to Optimization ٤ Examples on Rn and Rm×n affine functions are convex and concave; all norms are convex examples on Rn affine function f (x) = aT x + b Pn norms: kxkp = ( i=1 |xi|p)1/p for p ≥ 1; kxk∞ = maxk |xk | examples on Rm×n (m × n matrices) affine function m X X n f (X) = tr(AT X) + b = Aij Xij + b i=1 j=1 spectral (maximum singular value) norm f (X) = kXk2 = σmax(X) = (λmax(X T X))1/2 Prof. A MESLEH Introduction to Optimization ٥ Restriction of a convex function to a line f : Rn → R is convex if and only if the function g : R → R, g(t) = f (x + tv), dom g = {t | x + tv ∈ dom f } is convex (in t) for any x ∈ dom f , v ∈ Rn can check convexity of f by checking convexity of functions of one variable example. f : Sn → R with f (X) = log det X, dom X = Sn++ g(t) = log det(X + tV ) = log det X + log det(I + tX −1/2V X −1/2) Xn = log det X + log(1 + tλi) i=1 where λi are the eigenvalues of X −1/2V X −1/2 g is concave in t (for any choice of X ≻ 0, V ); hence f is concave Prof. A MESLEH Introduction to Optimization ٦ Extended-value extension extended-value extension f˜ of f is f˜(x) = f (x), x ∈ dom f, f˜(x) = ∞, x 6∈ dom f often simplifies notation; for example, the condition 0≤θ≤1 =⇒ f˜(θx + (1 − θ)y) ≤ θf˜(x) + (1 − θ)f˜(y) (as an inequality in R ∪ {∞}), means the same as the two conditions dom f is convex for x, y ∈ dom f , 0≤θ≤1 =⇒ f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) Prof. A MESLEH Introduction to Optimization ٧ First-order condition f is differentiable if dom f is open and the gradient ∂f (x) ∂f (x) ∂f (x) ∇f (x) = , ,..., ∂x1 ∂x2 ∂xn exists at each x ∈ dom f 1st-order condition: differentiable f with convex domain is convex iff f (y) ≥ f (x) + ∇f (x)T (y − x) for all x, y ∈ dom f f (y) f (x) + ∇f (x)T (y − x) (x, f (x)) first-order approximation of f is global underestimator Prof. A MESLEH Introduction to Optimization ٨ Second-order conditions f is twice differentiable if dom f is open and the Hessian ∇2f (x) ∈ Sn 2 ∂ 2f (x) ∇ f (x)ij = , i, j = 1,... , n, ∂xi∂xj exists at each x ∈ dom f 2nd-order conditions: for twice differentiable f with convex domain f is convex if and only if ∇2f (x) 0 for all x ∈ dom f if ∇2f (x) ≻ 0 for all x ∈ dom f , then f is strictly convex Prof. A MESLEH Introduction to Optimization ٩ Examples quadratic function: f (x) = (1/2)xT P x + q T x + r (with P ∈ Sn) ∇f (x) = P x + q, ∇2f (x) = P convex if P 0 least-squares objective: f (x) = kAx − bk22 ∇f (x) = 2AT (Ax − b), ∇2f (x) = 2AT A convex (for any A) Prof. A MESLEH Introduction to Optimization ١٠ Pn log-sum-exp: f (x) = log k=1 exp xk is convex 1 1 ∇2f (x) = diag(z) − zz T (zk = exp xk ) 1T z (1T z)2 to show ∇2f (x) 0, we must verify that v T ∇2f (x)v ≥ 0 for all v: ( k zk vk )( k zk ) − ( k vk zk )2 2 P P P T 2 v ∇ f (x)v = ≥0 ( k zk )2 P 2 2 P P P since ( k vk zk ) ≤ ( k zk vk )( k zk ) (from Cauchy-Schwarz inequality) Prof. A MESLEH Introduction to Optimization ١١ Epigraph and sublevel set α-sublevel set of f : Rn → R: Cα = {x ∈ dom f | f (x) ≤ α} sublevel sets of convex functions are convex (converse is false) epigraph of f : Rn → R: epi f = {(x, t) ∈ Rn+1 | x ∈ dom f, f (x) ≤ t} epi f f f is convex if and only if epi f is a convex set Prof. A MESLEH Introduction to Optimization ١٢ Jensen’s inequality basic inequality: if f is convex, then for 0 ≤ θ ≤ 1, f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) extension: if f is convex, then f (E z) ≤ E f (z) for any random variable z basic inequality is special case with discrete distribution prob(z = x) = θ, prob(z = y) = 1 − θ Prof. A MESLEH Introduction to Optimization ١٣ Operations that preserve convexity practical methods for establishing convexity of a function 1. verify definition (often simplified by restricting to a line) 2. for twice differentiable functions, show ∇2f (x) 0 3. show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective Prof. A MESLEH Introduction to Optimization ١٤ Positive weighted sum & composition with affine function nonnegative multiple: αf is convex if f is convex, α ≥ 0 (sum: f1 + f2 convex if f1, f2 convex (extends to infinite sums, integrals composition with affine function: f (Ax + b) is convex if f is convex examples log barrier for linear inequalities m X f (x) = − log(bi − aTi x), dom f = {x | aTi x < bi, i = 1,... , m} i=1 (any) norm of affine function: f (x) = kAx + bk Prof. A MESLEH Introduction to Optimization ١٥ Pointwise maximum if f1,... , fm are convex, then f (x) = max{f1(x),... , fm(x)} is convex examples piecewise-linear function: f (x) = maxi=1,...,m(aTi x + bi) is convex sum of r largest components of x ∈ Rn: f (x) = x + x + · · · + x[r] is convex (x[i] is ith largest component of x) proof: f (x) = max{xi1 + xi2 + · · · + xir | 1 ≤ i1 < i2 < · · · < ir ≤ n} Prof. A MESLEH Introduction to Optimization ١٦ Pointwise supremum if f (x, y) is convex in x for each y ∈ A, then g(x) = sup f (x, y) y∈A is convex examples support function of a set C: SC (x) = supy∈C y T x is convex distance to farthest point in a set C: f (x) = sup kx − yk y∈C maximum eigenvalue of symmetric matrix: for X ∈ Sn, λmax(X) = sup y T Xy kyk2 =1 Prof. A MESLEH Introduction to Optimization ١٧ Composition with scalar functions composition of g : Rn → R and h : R → R: f (x) = h(g(x)) g convex, h convex, h̃ nondecreasing f is convex if g concave, h convex, h̃ nonincreasing proof (for n = 1, differentiable g, h) f ′′(x) = h′′(g(x))g ′(x)2 + h′(g(x))g ′′(x) note: monotonicity must hold for extended-value extension h̃ examples exp g(x) is convex if g is convex 1/g(x) is convex if g is concave and positive Prof. A MESLEH Introduction to Optimization ١٨ Vector composition composition of g : Rn → Rk and h : Rk → R: f (x) = h(g(x)) = h(g1(x), g2(x),... , gk (x)) gi convex, h convex, h̃ nondecreasing in each argument f is convex if gi concave, h convex, h̃ nonincreasing in each argument proof (for n = 1, differentiable g, h) f ′′(x) = g ′(x)T ∇2h(g(x))g ′(x) + ∇h(g(x))T g ′′(x) examples Pm i=1 log gi (x) is concave if gi are concave and positive Pm log i=1 exp gi(x) is convex if gi are convex Prof. A MESLEH Introduction to Optimization ١٩ Minimization if f (x, y) is convex in (x, y) and C is a convex set, then g(x) = inf f (x, y) y∈C is convex examples f (x, y) = xT Ax + 2xT By + y T Cy with A B 0, C≻0 BT C minimizing over y gives g(x) = inf y f (x, y) = xT (A − BC −1B T )x g is convex, hence Schur complement A − BC −1B T 0 distance to a set: dist(x, S) = inf y∈S kx − yk is convex if S is convex Prof. A MESLEH Introduction to Optimization ٢٠ Perspective the perspective of a function f : Rn → R is the function g : Rn × R → R, g(x, t) = tf (x/t), dom g = {(x, t) | x/t ∈ dom f, t > 0} g is convex if f is convex examples f (x) = xT x is convex; hence g(x, t) = xT x/t is convex for t > 0 negative logarithm f (x) = − log x is convex; hence relative entropy g(x, t) = t log t − t log x is convex on R2++ if f is convex, then T T g(x) = (c x + d)f (Ax + b)/(c x + d) is convex on {x | cT x + d > 0, (Ax + b)/(cT x + d) ∈ dom f } Prof. A MESLEH Introduction to Optimization ٢١ The conjugate function the conjugate of a function f is f ∗(y) = sup (y T x − f (x)) x∈dom f f (x) xy x (0, −f ∗(y)) f ∗ is convex (even if f is not) will be useful in chapter 5 in Duality Prof. A MESLEH Introduction to Optimization ٢٢ Examples negative logarithm f (x) = − log x f ∗(y) = sup(xy + log x) x>0 −1 − log(−y) y < 0 = ∞ otherwise strictly convex quadratic f (x) = (1/2)xT Qx with Q ∈ Sn++ f ∗(y) = sup(y T x − (1/2)xT Qx) x 1 T −1 = y Q y 2 Prof. A MESLEH Introduction to Optimization ٢٣ Quasiconvex functions f : Rn → R is quasiconvex if dom f is convex and the sublevel sets Sα = {x ∈ dom f | f (x) ≤ α} are convex for all α β α a b c f is quasiconcave if −f is quasiconvex f is quasilinear if it is quasiconvex and quasiconcave Prof. A MESLEH Introduction to Optimization ٢٤ Examples p |x| is quasiconvex on R ceil(x) = inf{z ∈ Z | z ≥ x} is quasilinear log x is quasilinear on R++ f (x1, x2) = x1x2 is quasiconcave on R2++ linear-fractional function aT x + b f (x) = T , dom f = {x | cT x + d > 0} c x+d is quasilinear distance ratio kx − ak2 f (x) = , dom f = {x | kx − ak2 ≤ kx − bk2} kx − bk2 is quasiconvex Prof. A MESLEH Introduction to Optimization ٢٥ Properties modified Jensen inequality: for quasiconvex f 0≤θ≤1 =⇒ f (θx + (1 − θ)y) ≤ max{f (x), f (y)} first-order condition: differentiable f with cvx domain is quasiconvex iff f (y) ≤ f (x) =⇒ ∇f (x)T (y − x) ≤ 0 ∇f (x) x sums of quasiconvex functions are not necessarily quasiconvex Prof. A MESLEH Introduction to Optimization ٢٧ Log-concave and log-convex functions a positive function f is log-concave if log f is concave: f (θx + (1 − θ)y) ≥ f (x)θ f (y)1−θ for 0 ≤ θ ≤ 1 f is log-convex if log f is convex powers: xa on R++ is log-convex for a ≤ 0, log-concave for a ≥ 0 many common probability densities are log-concave, e.g., normal: 1 1 T Σ−1 (x−x̄) f (x) = p e− 2 (x−x̄) (2π)n det Σ cumulative Gaussian distribution function Φ is log-concave Z x 1 2 Φ(x) = √ e−u /2 du 2π −∞ Prof. A MESLEH Introduction to Optimization ٢٨ Properties of log-concave functions twice differentiable f with convex domain is log-concave if and only if f (x)∇2f (x) ∇f (x)∇f (x)T for all x ∈ dom f product of log-concave functions is log-concave sum of log-concave functions is not always log-concave integration: if f : Rn × Rm → R is log-concave, then Z g(x) = f (x, y) dy is log-concave (not easy to show) Prof. A MESLEH Introduction to Optimization ٢٩ Consequences of integration property convolution f ∗ g of log-concave functions f , g is log-concave Z (f ∗ g)(x) = f (x − y)g(y)dy if C ⊆ Rn convex and y is a random variable with log-concave pdf then f (x) = prob(x + y ∈ C) is log-concave proof: write f (x) as integral of product of log-concave functions Z 1 u∈C f (x) = g(x + y)p(y) dy, g(u) = 0 u∈ 6 C, p is pdf of y Prof. A MESLEH Introduction to Optimization ٣٠ Convexity with respect to generalized inequalities f : Rn → Rm is K-convex if dom f is convex and f (θx + (1 − θ)y) K θf (x) + (1 − θ)f (y) for x, y ∈ dom f , 0 ≤ θ ≤ 1 example f : Sm → Sm, f (X) = X 2 is Sm + -convex proof: for fixed z ∈ Rm, z T X 2z = kXzk22 is convex in X, i.e., z T (θX + (1 − θ)Y )2z ≤ θz T X 2z + (1 − θ)z T Y 2z for X, Y ∈ Sm, 0 ≤ θ ≤ 1 therefore (θX + (1 − θ)Y )2 θX 2 + (1 − θ)Y 2 Prof. A MESLEH Introduction to Optimization ٣٢