Optimization in Engineering PDF

1 Introduction Many disciplines involve optimization at their core. In physics, systems are driven to their lowest energy state subject to physical laws. In business, corporations aim to maximize shareholder value. In biology, fitter organisms are more likely to survive. This book will focus on optimization from an engineering perspective, where the objective is to design a system that optimizes a set of metrics subject to constraints. The system could be a complex physical system like an aircraft, or it could be a simple structure such as a bicycle frame. The system might not even be physical; for example, we might be interested in designing a control system for an automated vehicle or a computer vision system that detects whether an image of a tumor biopsy is cancerous. We want these systems to perform as well as possible. Depending on the application, relevant metrics might include efficiency, safety, and accuracy. Constraints on the design might include cost, weight, and structural soundness. This book is about the algorithms, or computational processes, for optimization. Given some representation of the system design, such as a set of numbers encoding the geometry of an airfoil, these algorithms will tell us how to search the space of possible designs with the aim of finding the best one. Depending on the application, this search may involve running physical experiments, such as wind tunnel tests, or it might involve evaluating an analytical expression or running computer simulations. We will discuss computational approaches for addressing a variety of challenges, such as how to search high-dimensional spaces, handling problems where there are multiple competing objectives, and accommodating uncertainty in the metrics. 2 c ha p te r 1. i n trod u c ti on 1.1 A History We will begin our discussion of the history of algorithms for optimization1 with the 1 This discussion is not meant to be ancient Greek philosophers. Pythagoras of Samos (569–475 BCE), the developer comprehensive. A more detailed history is provided by X.-S. Yang, of the Pythagorean theorem, claimed that ‘‘the principles of mathematics were the “A Brief History of Optimization,” principles of all things,’’2 popularizing the idea that mathematics could model the in Engineering Optimization. Wiley, 2010, pp. 1–13. world. Both Plato (427–347 BCE) and Aristotle (384–322 BCE) used reasoning for 2 Aristotle, Metaphysics, trans. by the purpose of societal optimization.3 They contemplated the best style of human W. D. Ross. 350 BCE, Book I, Part 5. life, which involves the optimization of both individual lifestyle and functioning 3 See discussion by S. Kiranyaz, T. of the state. Aristotelian logic was an early formal process—an algorithm—by Ince, and M. Gabbouj, Multidimen- sional Particle Swarm Optimization which deductions can be made. for Machine Learning and Pattern Optimization of mathematical abstractions also dates back millennia. Euclid Recognition. Springer, 2014, Sec- tion 2.1. of Alexandria (325–265 BCE) solved early optimization problems in geometry, including how to find the shortest and longest lines from a point to the circumfer- ence of a circle. He also showed that a square is the rectangle with the maximum area for a fixed perimeter.4 The Greek mathematician Zenodorus (200–140 BCE) 4 See books III and VI of Euclid, The studied Dido’s problem, shown in figure 1.1. Elements, trans. by D. E. Joyce. 300 BCE. Others demonstrated that nature seems to optimize. Heron of Alexandria (10–75 CE) showed that light travels between points through the path of shortest sea length. Pappus of Alexandria (290–350 CE), among his many contributions to optimization, argued that the hexagon repeated in honeycomb is the optimal regular polygon for storing honey; its hexagonal structure uses the least material Carthage to create a lattice of cells over a plane.5 Central to the study of optimization is the use of algebra, which is the study Figure 1.1. Queen Dido, founder of the rules for manipulating mathematical symbols. Algebra is credited to the of Carthage, was granted as much Persian mathematician al-Khwārizmī (790–850 CE) with the treatise ‘‘al-Kitāb al- land as she could enclose with a bullhide thong. She made a jabr wal-muqābala,’’ or ‘‘The Compendious Book on Calculation by Completion semicircle with each end of the and Balancing.’’ Algebra had the advantage of using Hindu-Arabic numerals, thong against the Mediterranean Sea, thus enclosing the maximum including the use of zero in base notation. The word al’jabr is Persian for restoration possible area. This problem is men- and is the source for the Western word algebra. The term algorithm comes from tioned in Virgil’s Aeneid (19 BCE). algoritmi, the Latin translation and pronunciation of al-Khwārizmī’s name. 5 T. C. Hales, “The Honeycomb Optimization problems are often posed as a search in a space defined by a set Conjecture,” Discrete & Computa- tional Geometry, vol. 25, pp. 1–22, of coordinates. Use of coordinates comes from René Descartes (1596–1650), who 2001. used two numbers to describe a point on a two-dimensional plane. His insight linked algebra, with its analytic equations, to the descriptive and visual field of 6 R. Descartes, “La Géométrie,” in geometry.6 His work also included a method for finding the tangent to any curve Discours de la Méthode. 1637. © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 1. 1. a hi story 3 whose equation is known. Tangents are useful in identifying the minima and maxima of functions. Pierre de Fermat (1601–1665) began solving for where the derivative is zero to identify potentially optimal points. The concept of calculus, or the study of continuous change, plays an impor- tant role in our discussion of optimization. Modern calculus stems from the developments of Gottfried Wilhelm Leibniz (1646–1716) and Sir Isaac Newton (1642–1727). Both differential and integral calculus make use of the notion of convergence of infinite series to a well-defined limit. The mid-twentieth century saw the rise of the electronic computer, spurring interest in numerical algorithms for optimization. The ease of calculations allowed optimization to be applied to much larger problems in a variety of domains. One of the major breakthroughs came with the introduction of linear programming, which is an optimization problem with a linear objective function and linear constraints. Leonid Kantorovich (1912–1986) presented a formulation for linear programming and an algorithm to solve it.7 It was applied to optimal resource 7 L. V. Kantorovich, “A New Method of Solving Some Classes allocation problems during World War II. George Dantzig (1914–2005) developed of Extremal Problems,” in Pro- the simplex algorithm, which represented a significant advance in solving linear ceedings of the USSR Academy of programs efficiently.8 Richard Bellman (1920–1984) developed the notion of dy- Sciences, vol. 28, 1940. 8 The simplex algorithm will be namic programming, which is a commonly used method for optimally solving covered in chapter 11. complex problems by breaking them down into simpler problems.9 Dynamic pro- 9 R. Bellman, “On the Theory of Dy- gramming has been used extensively for optimal control. This textbook outlines namic Programming,” Proceedings many of the key algorithms developed for digital computers that have been used of the National Academy of Sciences of the United States of America, vol. 38, for various engineering design optimization problems. no. 8, pp. 716–719, 1952. Decades of advances in large scale computation have resulted in innovative physical engineering designs as well as the design of artificially intelligent systems. The intelligence of these systems has been demonstrated in games such as chess, Jeopardy!, and Go. IBM’s Deep Blue defeated the world chess champion Garry Kasparov in 1996 by optimizing moves by evaluating millions of positions. In 2011, IBM’s Watson played Jeopardy! against former winners Brad Futter and Ken Jennings. Watson won the first place prize of $1 million by optimizing its response with respect to probabilistic inferences about 200 million pages of structured and unstructured data. Since the competition, the system has evolved to assist in healthcare decisions and weather forecasting. In 2017, Google’s AlphaGo defeated Ke Jie, the number one ranked Go player in the world. The system used neural networks with millions of parameters that were optimized from self-play and © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 4 c ha p te r 1. i n trod u c ti on data from human games. The optimization of deep neural networks is fueling a major revolution in artificial intelligence that will likely continue.10 10 I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016. 1.2 Optimization Process A typical engineering design optimization process is shown in figure 1.2.11 The 11 Further discussion of the design role of the designer is to provide a problem specification that details the parameters, process in engineering is provided in J. Arora, Introduction to Optimum constants, objectives, and constraints that are to be achieved. The designer is Design, 4th ed. Academic Press, responsible for crafting the problem and quantifying the merits of potential 2016. designs. The designer also typically supplies a baseline design or initial design point to the optimization algorithm. Change no Design design Initial Evaluate yes final Good? specifications Design Performance design Figure 1.2. The design optimiza- tion process. We seek to automate This book is about automating the process of refining the design to improve the optimization procedure high- performance. An optimization algorithm is used to incrementally improve the lighted in blue. design until it can no longer be improved or until the budgeted time or cost has been reached. The designer is responsible for analyzing the result of the optimiza- tion process to ensure its suitability for the final application. Misspecifications in the problem, poor baseline designs, and improperly implemented or unsuitable optimization algorithms can all lead to suboptimal or dangerous designs. There are several advantages of an optimization approach to engineering de- sign. First of all, the optimization process provides a systematic, logical design procedure. If properly followed, optimization algorithms can help reduce the chance of human error in design. Sometimes intuition in engineering design can be misleading; it can be much better to optimize with respect to data. Optimization can speed the process of design, especially when a procedure can be written once and then be reapplied to other problems. Traditional engineering techniques are © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 1. 3. ba si c op ti mi z a ti on p robl e m 5 often visualized and reasoned about by humans in two or three dimensions. Mod- ern optimization techniques, however, can be applied to problems with millions of variables and constraints. There are also challenges associated with using optimization for design. We are generally limited in our computational resources and time, and so our algorithms have to be selective in how they explore the design space. Fundamentally, the optimization algorithms are limited by the designer’s ability to specify the prob- lem. In some cases, the optimization algorithm may exploit modeling errors or provide a solution that does not adequately solve the intended problem. When an algorithm results in an apparently optimal design that is counterintuitive, it can be difficult to interpret. Another limitation is that many optimization algorithms are not guaranteed to produce optimal designs. 1.3 Basic Optimization Problem The basic optimization problem is: minimize f (x) x (1.1) subject to x ∈ X Here, x is a design point. A design point can be represented as a vector of values corresponding to different design variables. An n-dimensional design point is written:12 12 As in Julia, square brackets with (1.2) comma-separated entries are used [ x1 , x2 , · · · , x n ] to represent column vectors. De- where the ith design variable is denoted xi. The elements in this vector can be sign points are column vectors. adjusted to minimize the objective function f. Any value of x from among all points in the feasible set X that minimizes the objective function is called a solution or minimizer. A particular solution is written x∗. Figure 1.3 shows an example of a one-dimensional optimization problem. This formulation is general, meaning that any optimization problem can be rewritten according to equation (1.1). In particular, a problem maximize f (x) subject to x ∈ X (1.3) x can be replaced by minimize − f (x) subject to x ∈ X (1.4) x © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 6 c ha p te r 1. i n trod u c ti on Figure 1.3. A one-dimensional op- timization problem. Note that the minimum is merely the best in the feasible set—lower points may ex- ist outside the feasible region. x∗ f (x) X x The new form is the same problem in that it has the same set of solutions. Modeling engineering problems within this mathematical formulation can be 13 See discussion in S. Boyd and L. Vandenberghe, Convex Optimiza- challenging. The way in which we formulate an optimization problem can make tion. Cambridge University Press, the solution process either easy or hard.13 We will focus on the algorithmic aspects 2004. of optimization that arise after the problem has been properly formulated.14 14 Many texts provide examples of Since this book discusses a wide variety of different optimization algorithms, how to translate real-world opti- mization problems into optimiza- one may wonder which algorithm is best. As elaborated by the no free lunch tion problems. See, for example, theorems of Wolpert and Macready, there is no reason to prefer one algorithm over the following: R. K. Arora, Opti- mization: Algorithms and Applica- another unless we make assumptions about the probability distribution over the tions. Chapman and Hall/CRC, space of possible objective functions. If one algorithm performs better than another 2015. A. D. Belegundu and T. R. Chandrupatla, Optimization Con- algorithm on one class of problems, then it will perform worse on another class cepts and Applications in Engineer- of problems.15 For many optimization algorithms to work effectively, there needs ing, 2nd ed. Cambridge Univer- to be some regularity in the objective function, such as Lipschitz continuity or sity Press, 2011. A. Keane and P. Nair, Computational Approaches for convexity, both topics that we will cover later. As we discuss different algorithms, Aerospace Design. Wiley, 2005. P. Y. we will outline their assumptions, the motivation for their mechanism, and their Papalambros and D. J. Wilde, Prin- advantages and disadvantages. ciples of Optimal Design. Cambridge University Press, 2017. 15 The assumptions and results of 1.4 Constraints the no free lunch theorems are provided by D. H. Wolpert and W. G. Macready, “No Free Lunch Many problems have constraints. Each constraint limits the set of possible solutions, Theorems for Optimization,” IEEE and together the constraints define the feasible set X. Feasible design points do Transactions on Evolutionary Compu- tation, vol. 1, no. 1, pp. 67–82, 1997. not violate any constraints. For example, consider the following optimization © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 1. 5. c ri ti c a l p oi n ts 7 problem: minimize f ( x1 , x2 ) x1 ,x2 x2 subject to x1 ≥ 0 (1.5) (0, 1) x2 ≥ 0 x1 + x2 ≤ 1 The feasible set is plotted in figure 1.4. X Constraints are typically written with ≤, ≥, or =. If constraints involve < or x1 > (i.e., strict inequalities), then the feasible set does not include the constraint (1, 0) boundary. A potential issue with not including the boundary is illustrated by this Figure 1.4. The feasible set X asso- problem: ciated with equation (1.5). minimize x x (1.6) subject to x > 1 f (x) The feasible set is shown in figure 1.5. The point x = 1 produces values smaller than any x greater than 1, but x = 1 is not feasible. We can pick any x arbitrarily close to, but greater than, 1, but no matter what we pick, we can always find an infinite number of values even closer to 1. We must conclude that the problem has no solution. To avoid such issues, it is often best to include the constraint (1, 1) boundary in the feasible set. x Figure 1.5. The problem in equa- tion (1.6) has no solution because 1.5 Critical Points the constraint boundary is not fea- sible. Figure 1.6 shows a univariate function16 f ( x ) with several labeled critical points, 16 A univariate function is a func- tion of a single scalar. The term uni- where the derivative is zero, that are of interest when discussing optimization variate describes objects involving problems. When minimizing f , we wish to find a global minimizer, a value of x for one variable. which f ( x ) is minimized. A function may have at most one global minimum, but it may have multiple global minimizers. Unfortunately, it is generally difficult to prove that a given candidate point is at a global minimum. Often, the best we can do is check whether it is at a local minimum. A point x ∗ is at a local minimum (or is a local minimizer) if there exists a δ > 0 such that f ( x ∗ ) ≤ f ( x ) for all x with | x − x ∗ | < δ. In the multivariate context, this definition generalizes to there being a δ > 0 such that f (x∗ ) ≤ f (x) whenever kx − x∗ k < δ. © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 8 c ha p te r 1. i n trod u c ti on Figure 1.6. Examples of critical points of interest to optimization algorithms (where the derivative is zero) on a univariate function. f (x) inflection weak local min strong local min global min x Figure 1.6 shows two types of local minima: strong local minima and weak local minima. A strong local minimizer, also known as a strict local minimizer, is a point that uniquely minimizes f within a neighborhood. In other words, x ∗ is a strict local minimizer if there exists a δ > 0 such that f ( x ∗ ) < f ( x ) whenever x ∗ 6= x and | x − x ∗ | < δ. In the multivariate context, this generalizes to there being a δ > 0 such that f (x∗ ) < f (x) whenever x∗ 6= x and kx − x∗ k < δ. A weak local minimizer is a local minimizer that is not a strong local minimizer. The derivative is zero at all local and global minima of continuous, unbounded objective functions. While having a zero derivative is a necessary condition for a local minimum,17 it is not a sufficient condition. 17 Points with nonzero derivatives Figure 1.6 also has an inflection point where the derivative is zero but the point are never minima. does not locally minimize f. An inflection point is where the sign of the second derivative of f changes, which corresponds to a local minimum or maximum of f ′. 1.6 Conditions for Local Minima Many numerical optimization methods seek local minima. Local minima are locally optimal, but we do not generally know whether a local minimum is a global minimum. The conditions we discuss in this section assume that the objective function is differentiable. Derivatives, gradients, and Hessians are reviewed in © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 1. 6. c on d i ti on s for l oc a l mi n i ma 9 the next chapter. We also assume in this section that the problem is unconstrained. Conditions for optimality in constrained problems are introduced in chapter 10. 1.6.1 Univariate A design point is guaranteed to be at a strong local minimum if the local derivative is zero and the second derivative is positive: 1. f ′ ( x ∗ ) = 0 2. f ′′ ( x ∗ ) > 0 A zero derivative ensures that shifting the point by small values does not significantly affect the function value. A positive second derivative ensures that the zero first derivative occurs at the bottom of a bowl.18 18 If f ′ ( x ) = 0 and f ′′ ( x ) < 0, then A point can also be at a local minimum if it has a zero derivative and the second x is a local maximum. derivative is merely nonnegative: 1. f ′ ( x ∗ ) = 0, the first-order necessary condition (FONC)19 19 A point that satisfies the first- order necessary condition is some- 2. f ′′ ( x ∗ ) ≥ 0, the second-order necessary condition (SONC) times called a stationary point. These conditions are referred to as necessary because all local minima obey these two rules. Unfortunately, not all points with a zero derivative and a zero second derivative are local minima, as demonstrated in figure 1.7. The first necessary condition can be derived using the Taylor expansion20 20 The Taylor expansion is derived about our candidate point x ∗ : in appendix C. f ( x ∗ + h ) = f ( x ∗ ) + h f ′ ( x ∗ ) + O ( h2 ) (1.7) ∗ ∗ ′ ∗ f ( x − h) = f ( x ) − h f ( x ) + O(h ) 2 (1.8) ∗ ∗ f ( x + h) ≥ f ( x ) =⇒ h f ( x ) ≥ 0′ ∗ (1.9) ∗ ∗ f ( x − h) ≥ f ( x ) =⇒ h f ( x ) ≤ 0′ ∗ (1.10) =⇒ f ′ ( x ∗ ) = 0 (1.11) where the asymptotic notation O(h2 ) is covered in appendix C. The second-order necessary condition can also be obtained from the Taylor expansion: h2 f ( x ∗ + h) = f ( x ∗ ) + h f ′ ( x ∗ ) + f ′′ ( x ∗ ) + O(h3 ) (1.12) | {z } 2 =0 © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 10 c ha p te r 1. i n trod u c ti on x∗ x∗ x∗ SONC but not FONC FONC and SONC FONC and SONC Figure 1.7. Examples of the neces- We know that the first-order necessary condition must apply: sary but insufficient conditions for strong local minima. h2 ′′ ∗ f ( x ∗ + h) ≥ f ( x ∗ ) =⇒ f (x ) ≥ 0 (1.13) 2 since h > 0. It follows that f ′′ ( x ∗ ) ≥ 0 must hold for x ∗ to be at a local minimum. 1.6.2 Multivariate The following conditions are necessary for x to be at a local minimum of f : 1. ∇ f (x) = 0, the first-order necessary condition (FONC) 2. ∇2 f (x) is positive semidefinite (for a review of this definition, see appendix C.6), the second-order necessary condition (SONC) The FONC and SONC are generalizations of the univariate case. The FONC tells us that the function is not changing at x. Figure 1.8 shows examples of multivariate functions where the FONC is satisfied. The SONC tells us that x is in a bowl. The FONC and SONC can be obtained from a simple analysis. In order for x∗ to be at a local minimum, it must be smaller than those values around it: f (x∗ ) ≤ f (x∗ + hy) ⇔ f (x∗ + hy) − f (x∗ ) ≥ 0 (1.14) If we write the second-order approximation for f (x∗ ), we get: 1 f (x∗ + hy) = f (x∗ ) + h∇ f (x∗ )⊤ y + h2 y⊤ ∇2 f (x∗ )y + O(h3 ) (1.15) 2 We know that at a minimum, the first derivative must be zero, and we neglect the higher order terms. Rearranging, we get: 1 2 ⊤ 2 h y ∇ f (x∗ )y = f (x∗ + hy) − f (x∗ ) ≥ 0 (1.16) 2 © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 1. 7. c on tou r p l ots 11 This is the definition of a positive semidefinite matrix, and we recover the SONC. Example 1.1 illustrates how these conditions can be applied to the Rosenbrock banana function. A local maximum. The gradient A saddle. The gradient at the A bowl. The gradient at the at the center is zero, but the center is zero, but it is not a center is zero and the Hessian Hessian is negative definite. local minimum. is positive definite. It is a local minimum. Figure 1.8. The three local regions where the gradient is zero. While necessary for optimality, the FONC and SONC are not sufficient for optimality. For unconstrained optimization of a twice-differentiable function, a point is guaranteed to be at a strong local minimum if the FONC is satisfied and ∇2 f (x) is positive definite. These conditions are collectively known as the second-order sufficient condition (SOSC). 1.7 Contour Plots This book will include problems with a variety of numbers of dimensions, and will need to display information over one, two, or three dimensions. Functions of the form f ( x1 , x2 ) = y can be rendered in three-dimensional space, but not all orientations provide a complete view over the domain. A contour plot is a visual representation of a three-dimensional surface obtained by plotting regions with constant y values, known as contours, on a two-dimensional plot with axes indexed by x1 and x2. Example 1.2 illustrates how a contour plot can be interpreted. 1.8 Overview This section provides a brief overview of the chapters of this book. The conceptual dependencies between the chapters are outlined in figure 1.9. © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 12 c ha p te r 1. i n trod u c ti on Consider the Rosenbrock banana function, Example 1.1. Checking the first- and second-order necessary con- ditions of a point on the Rosen- f (x) = (1 − x1 )2 + 5( x2 − x12 )2 brock function. The minimizer is indicated by the dot in the figure Does the point (1, 1) satisfy the FONC and SONC? below. The gradient is: 2 " ∂f # " # ∂x1 2 10x13 − 10x1 x2 + x1 − 1 ∇ f (x) = ∂f = 10( x2 − x12 ) x2 0 ∂x2 and the Hessian is:  2  " # −2 ∂ f ∂2 f −2 0 2  ∂x12∂x1 ∂x1 ∂x2  −20( x2 − x12 ) + 40x12 + 2 −20x1 ∇2 f (x) = ∂ f ∂2 f = x1 −20x1 10 ∂x2 ∂x1 ∂x2 ∂x2 We compute ∇( f )([1, 1]) = 0, so the FONC is satisfied. The Hessian at [1, 1] is: " # 42 −20 −20 10 which is positive definite, so the SONC is satisfied. Example 1.2. An example three- The function f ( x1 , x2 ) = x12 − x22. This function can be visualized in a three- dimensional visualization and the dimensional space based on its two inputs and one output. It can also be associated contour plot. visualized using a contour plot, which shows lines of constant y value. A three-dimensional visualization and a contour plot are shown below. −4 2 0 −2 4 4 0 5 2 2 x2 0 0 0 −5 2 −2 −2 0 0 0 −2 −4 2 −2 x2 x1 −2 0 2 x1 © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 1. 8. ove rvi e w 13 2. Derivatives 4. Descent 5. First-Order Figure 1.9. Dependencies for the chapters in Algorithms for Optimiza- 3. Bracketing 6. Second-Order tion. Lighter arrows indicate weak dependencies. 7. Direct 8. Stochastic 9. Population 20. Expressions 10. Constraints 11. Linear 19. Discrete 12. Multiobjective 13. Sampling 14. Surrogate 15. Probabilistic 16. Surrogate Plans Models Surrogate Models Optimization 17. Optimization under Uncertainty 18. Uncertainty Propagation 21. Multidisciplinary Design Optimization Chapter 2 begins by discussing derivatives and their generalization to multiple dimensions. Derivatives are used in many algorithms to inform the choice of direction of the search for an optimum. Often derivatives are not known analyti- cally, and so we discuss how to estimate them numerically and using automatic differentiation techniques. Chapter 3 discusses bracketing, which involves identifying an interval in which a local minimum lies for a univariate function. Different bracketing algorithms use different schemes for successively shrinking the interval based on function evalua- tions. One of the approaches we discuss uses knowledge of the Lipschitz constant of the function to guide the bracketing process. These bracketing algorithms are often used as subroutines within the optimization algorithms discussed later in the text. Chapter 4 introduces local descent as a general approach to optimizing multivari- ate functions. Local descent involves iteratively choosing a descent direction and then taking a step in that direction and repeating that process until convergence or some termination condition is met. There are different schemes for choosing the step length. We will also discuss methods that adaptively restrict the step size to a region where there is confidence in the local model. © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 14 c ha p te r 1. i n trod u c ti on Chapter 5 builds upon the previous chapter, explaining how to use first-order information obtained through the gradient estimate as a local model to inform the descent direction. Simply stepping in the direction of steepest descent is often not the best strategy for finding a minimum. This chapter discusses a wide variety of different methods for using the past sequence of gradient estimates to better inform the search. Chapter 6 shows how to use local models based on second-order approximations to inform local descent. These models are based on estimates of the Hessian of the objective function. The advantage of second-order approximations is that it can inform both the direction and step size. Chapter 7 presents a collection of direct methods for finding optima that avoid using gradient information for informing the direction of search. We begin by discussing methods that iteratively perform line search along a set of directions. We then discuss pattern search methods that do not perform line search but rather perform evaluations some step size away from the current point along a set of directions. The step size is incrementally adapted as the search proceeds. Another method uses a simplex that adaptively expands and contracts as it traverses the design space in the apparent direction of improvement. Finally, we discuss a method motivated by Lipschitz continuity to increase resolution in areas deemed likely to contain the global minimum. Chapter 8 introduces stochastic methods, where randomness is incorporated into the optimization process. We show how stochasticity can improve some of the algorithms discussed in earlier chapters, such as steepest descent and pattern search. Some of the methods involve incrementally traversing the search space, but others involve learning a probability distribution over the design space, assigning greater weight to regions that are more likely to contain an optimum. Chapter 9 discusses population methods, where a collection of points is used to explore the design space. Having a large number of points distributed through the space can help reduce the risk of becoming stuck in a local minimum. Population methods generally rely upon stochasticity to encourage diversity in the population, and they can be combined with local descent methods. Chapter 10 introduces the notion of constraints in optimization problems. We begin by discussing the mathematical conditions for optimality with constraints. We then introduce methods for incorporating constraints into the optimization algorithms discussed earlier through the use of penalty functions. We also discuss © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 1. 8. ove rvi e w 15 methods for ensuring that, if we start with a feasible point, the search will remain feasible. Chapter 11 makes the assumption that both the objective function and con- straints are linear. Although linearity may appear to be a strong assumption, many engineering problems can be framed as linear constrained optimization problems. Several methods have been developed for exploiting this linear structure. This chapter focuses on the simplex algorithm, which is guaranteed to result in a global minimum. Chapter 12 shows how to address the problem of multiobjective optimization, where we have multiple objectives that we are trying to optimize simultaneously. Engineering often involves a tradeoff between multiple objectives, and it is often unclear how to prioritize different objectives. We discuss how to transform multi- objective problems into scalar-valued objective functions so that we can use the algorithms discussed in earlier chapters. We also discuss algorithms for finding the set of design points that represent the best tradeoff between objectives. Chapter 13 discusses how to create sampling plans consisting of points that cover the design space. Random sampling of the design space often does not provide adequate coverage. We discuss methods for ensuring uniform coverage along each design dimension and methods for measuring and optimizing the coverage of the space. In addition, we discuss quasi-random sequences that can also be used to generate sampling plans. Chapter 14 explains how to build surrogate models of the objective function. Surrogate models are often used for problems where evaluating the objective function is very expensive. An optimization algorithm can then use evaluations of the surrogate model instead of the actual objective function to improve the design. The evaluations can come from historical data, perhaps obtained through the use of a sampling plan introduced in the previous chapter. We discuss different types of surrogate models, how to fit them to data, and how to identify a suitable surrogate model. Chapter 15 introduces probabilistic surrogate models that allow us to quantify our confidence in the predictions of the models. This chapter focuses on a particular type of surrogate model called a Gaussian process. We show how to use Gaussian processes for prediction, how to incorporate gradient measurements and noise, and how to estimate some of the parameters governing the Gaussian process from data. © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 16 c ha p te r 1. i n trod u c ti on Chapter 16 shows how to use the probabilistic models from the previous chapter to guide surrogate optimization. The chapter outlines several techniques for choosing which design point to evaluate next. We also discuss how surrogate models can be used to optimize an objective measure in a safe manner. Chapter 17 explains how to perform optimization under uncertainty, relaxing the assumption made in previous chapters that the objective function is a deterministic function of the design variables. We discuss different approaches for representing uncertainty, including set-based and probabilistic approaches, and explain how to transform the problem to provide robustness to uncertainty. Chapter 18 outlines approaches to uncertainty propagation, where known input distributions are used to estimate statistical quantities associated with the output distribution. Understanding the output distribution of an objective function is important to optimization under uncertainty. We discuss a variety of approaches, some based on mathematical concepts such as Monte Carlo, the Taylor series approximation, orthogonal polynomials, and Gaussian processes. They differ in the assumptions they make and the quality of their estimates. Chapter 19 shows how to approach problems where the design variables are constrained to be discrete. A common approach is to relax the assumption that the variables are discrete, but this can result in infeasible designs. Another approach involves incrementally adding linear constraints until the optimal point is discrete. We also discuss branch and bound along with dynamic programming approaches, both of which guarantee optimality. The chapter also mentions a population-based method that often scales to large design spaces but does not provide guarantees. Chapter 20 discusses how to search design spaces consisting of expressions defined by a grammar. For many problems, the number of variables is unknown, such as in the optimization of graphical structures or computer programs. We outline several algorithms that account for the grammatical structure of the design space to make the search more efficient. Chapter 21 explains how to approach multidisciplinary design optimization. Many engineering problems involve complicated interactions between several disci- plines, and optimizing disciplines individually may not lead to an optimal solu- tion. This chapter discusses a variety of techniques for taking advantage of the structure of multidisciplinary problems to reduce the effort required for finding good designs. The appendices contain supplementary material. Appendix A begins with a short introduction to the Julia programming language, focusing on the concepts © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected] 1. 9. su mma ry 17 used to specify the algorithms listed in this book. Appendix B specifies a variety of test functions used for evaluating the performance of different algorithms. Appendix C covers mathematical concepts used in the derivation and analysis of the optimization methods discussed in this text. 1.9 Summary Optimization in engineering is the process of finding the best system design subject to a set of constraints. Optimization is concerned with finding global minima of a function. Minima occur where the gradient is zero, but zero-gradient does not imply optimality. 1.10 Exercises Exercise 1.1. Give an example of a function with a local minimum that is not a global minimum. Exercise 1.2. What is the minimum of the function f ( x ) = x3 − x? Exercise 1.3. Does the first-order condition f ′ ( x ) = 0 hold when x is the optimal solution of a constrained problem? Exercise 1.4. How many minima does f ( x, y) = x2 + y, subject to x > y ≥ 1, have? Exercise 1.5. How many inflection points does f ( x ) = x3 − 10 have? © 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license. 2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected]

Optimization in Engineering PDF

Document Details

Tags

Related

Summary

Full Transcript