Chapter 1: Introduction PDF
Document Details
Uploaded by Deleted User
UNIGE
Prof. Roland Bouffanais
Tags
Summary
Chapter 1: Introduction to optimization. It reviews a course syllabus from the textbook Livre PUP/Springer. Course instructor: Prof. Roland Bouffanais (UNIGE).
Full Transcript
Chapter 1: Introduction Course instructor: Prof. Roland Bouffanais (UNIGE) 1 semester 6 credits Review of course syllabus Textbook: livre PUP/Springer 1 / 42 To get started: let us consider the following...
Chapter 1: Introduction Course instructor: Prof. Roland Bouffanais (UNIGE) 1 semester 6 credits Review of course syllabus Textbook: livre PUP/Springer 1 / 42 To get started: let us consider the following example Let us consider n people in a 2D space, each at positions xi , i = 1,... , n. Problem: Find the point y where these people should meet to minimize the sum of the distances traveled by each of them. What is this point? Geometric median, by definition (see 1D example) n argmin xi − y y ∈R2 i=1 No analytical solution, but an iterative algorithm: Weiszfeld’s algorithm: xi 1 yk+1 = / n |xi − yk | n |xi − yk | i=1 i=1 Numerical solution by hill climbing (or steepest descent) https://en.wikipedia.org/wiki/Geometric_median 2 / 42 Illustration Distance between (x,y) and a set of points z=distance(x,y) height 100 350 350 40 0 300 400 400 80 250 350 350 total distan 60 300 300 y ce 250 40 200 100 250 80 60 20 0 40 200 y 20 40 20 60 200 x 0 80 30 30 0 0 0 100 0 20 40 60 80 100 x 3 / 42 Here is a difficult optimization problem 4 / 42 1.1 Search Space Unconstrained optimization: Classicaly, solve an optimization problem specified as: find x ∈ S so that f (x) ∈ R is maximal/minimal. xopt = argmin/argmax f (x) x∈S fopt = min/max f (x) x∈S Constrained optimization: Usually, in optimization problems, x may be subject to given constraints (e.g. x < b). The constraints can be expressed by restricting S to acceptable solutions. 5 / 42 Definitions & notations S is the search space S can be discrete, infinite or continuous. Usually, here S is very large The elements x of S are called: possible solutions or configurations The function f is called: objective function, cost function, loss function, fitness, or energy: f : S → R Optimal solution xopt : f (xopt ) ≥ f (y ), ∀y ∈ S Optimal value: fopt = f (xopt ) Unicity of xopt and fopt ? Global or local optimum 6 / 42 Search space In general, multidimensionnal problems: x = (x1 , x2 ,..., xn ). n is the problem size, or the number of degrees of freedom The cardinality of S grows exponentially with the problem size. xi ∈ S i S = S1 × S 2... × Sn |S| = |Si |n The difficulty is that S is too large for an exhaustive and systematic exploration Metaheuristics: offer a way to explore S when no exact polynomial algorithms exists Metaheuristics contrasts with “Convex optimization” for which a lot of algorithms exists. 7 / 42 1.2 Examples S = Rn : Continuous optimization problem. Derivatives (numerical ones), Lagrange multipliers, Local/global optima. These need to be examined individually and consider border effects. Lagrangian Multiplyer 1 maximize f (x1 , x2 ) with the constraint that g (x1 , x2 ) = 0 y gradf = λgradg λ is a Lagrange multiplier -1 -1 1 x 8 / 42 More examples Linear programming: method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear relationships. Maximize z = f (x) = i ci xi with the following constraints n i aij xi ≤ bj. x ∈ S = R. Solutions are at the border of a convex polygon (SIMPLEX algorithm). Things get tricky if x ∈ S = {0, 1}n. Knapsack: objects of value pi with size wi. Maximize n pi xi with wi xi ≤ c. xi ∈ {0, 1}, c > 0 size of the sack. Traveling Salesmen Problem (TSP): Combinatorial optimization problem (i.e., quadratic assigment problem 1QAP) 9 / 42 Still more examples: NK problems MaxOne problem NK-problems. S Kauffman, gene regulatory network. N genes, linked to K others: graph with N nodes and degree K. x ∈ S = {0, 1}N. N f (x) = fi (xi , neighborsi ) i K args with K − 1 neighbors of xi. MaxOne: K = 1. Case K = 3: h(111) or h(110) is max. NK problems are theoretically interesting since they offer synthetic search spaces, especially when N and K are growing 10 / 42 More about NK problems Let us consider N people with two commercial strategies denoted by xi = 0 or xi = 1 (binary type) The success of each strategy critically depends on the strategies of other people (inter-dependency – complex systems), and their relationship (cooperation vs. competition) Each individual is influenced by K others. The function indicates the resulting profit/gain associated with the strategy adopted by agent i given the inter-dependency with the rest of the collective The overall collective profit/gain for the system of N agents is simply the sum f = fi. 11 / 42 1.3 Revisiting computational complexity To find an optimum solution, one needs an efficient algorithm (obviously!) The computational effort required by an algorithm is given by the required amount of memory M and the computational time T M and T are the space and time complexity of the algorithm respectively Usually, the complexity of the worst case is considered, or that of the average case. One usually consideres the asymptotic complexity, that is the relation M(n) and T (n) as a function of the problem size n 12 / 42 Measuring complexity: the big “O” notation Complexity is expressed with an expression like “O()”. If one says that T (n) = O(f (n)), it means that there exists a constant k and a value n0 such that if n ≥ n0 then T (n) ≤ kf (n). For instance: T (n) = 0.5n2 + 2n = O(n2 ) T (n) = n + log(n) < n + n = O(n) 13 / 42 Algorithmic Complexity For some problems (sorting, matrix multiplication, etc), an algorithm is known that solves them in a polynomial time T (n) = O(nm ) with m usually small. These problems belongs to the P-class. For other problems no polynomial algorithms is known (or none exists!). There are only exponential algorithms. For instance T (n) = O(2n ) These problems are computationally challenging because T (n) becomes excessively large when n increases even moderately. 14 / 42 Intractability of exponential problems n 10 50 100 1000 log2 n 3 6 7 10 n3 1000 125’000 106 109 2n 1024 1.3 × 1015 1.27 × 1030 1.05 × 10301 n! 3629 × 103 3.041 × 1064 10158 4 × 102567 15 / 42 Complexity Classes In complexity theory, there are problems said to be in classes P, NP, NP-complete or NP-hard The P-class contains problems which can be solved by a polynomial algorithm. NP means non-deterministic polynomial as they can be defined through a non-deterministic Turing machine NP problems are those problems for which a given solution can be verified in a polynomial time. Thus, P-problems are obviously in NP. 16 / 42 Complexity classes and “hard-optimization” NP-hard problems are those whose solution solves any NP-problem, up to a polynomial transformation Problems that are both in the classes NP-hard and NP form the class NP-complete. NP-complete problems comprise of Graph coloring, satisfaction problems, finding hamiltonian cycles in a graph,... NP-complete and NP-hard are solved with exponential algorithms. 17 / 42 Complexity Classes P NP NP-complete NP-Hard 18 / 42 Example: Hamiltonian cycle Find a close path that goes through all nodes once and only once a a 1 1 1 1 b c b c 1 1 1 1 1 1 d d Note that the length of the path is 4 and is equal to the number of nodes. 19 / 42 TSP finds out if an Hamiltonian cycle exists TSP (Traveling Salesman Problem): find the shortest path that goes through all nodes once and only once, assuming a fully connected graph. Add missing edges with weight 2 a a 2 2 1 1 b c b c 1 1 1 1 1 1 d d No Hamiltonian cycle Shortest tour is of length 5 and not 4 20 / 42 TSP is a NP-hard problem TSP is not in NP as one cannot check that a tour is of minimal length in a polynomial time Finding a Hamiltonian cycle is not in P as no algorithm is known to find a Hamiltonian cycle in a polynomial time. Finding a Hamiltonian cycle is known to be NP-complete (difficult to demonstrate) TSP solves the Hamiltonian cycle problem TSP is therefore NP-hard 21 / 42 Complexity Classes: a Summary Hamiltonian cycle P NP NP-complete NP-Hard sorting Factoring TSP large intergers 22 / 42 Hard-optimization Many optimization problems that need metaheuristics are NP-hard. One calls them Hard-optimization problems A deterministic solution would require an exponential algorithm Metaheuristics are then used But, what is then the complexity of a metaheuristic? 23 / 42 1.4 Metaheuristics Metaheuristics: term coined by Glover in 1986 For many real-life problems (OR), no polynomial algorithm exists. (But for other problems yes!) If n is large, the search space S becomes too vast to use an exponential algorithm Metaheuristics are meant to explore S in a somewhat “intelligent" fashion to find optimals. (hopefully in a more effective way than through a pure random sampling) Trade-off between quality of the solution and computational cost (i.e. CPU time): able to return an acceptable solution in a reasonable amount of time 24 / 42 Heuristics vs. Metaheuristics Heuristics: methods of exploration that exploits some specific aspects of the problem at hand and only applies to it. For example, when solving a LP problem by the simplex algorithm, a heuristic is often used for choosing so-called entering and leaving variables. Metaheuristics: general exploration methods, often stochastic, that applies in the same way to many different problems. Examples are tabu search, simulated annealing, ant colony, and evolutionary algorithms. Topic of this course!! Algorithms for approximation: seek to find solutions to specific problems with a chosen quality and with specific limits in terms of CPU time 25 / 42 Characterization of Metaheuristics No hypothesis on the mathetical properties of f. The only requirement is that f can be computed for all x ∈ S Based on a number of parameters that guide the exploration, and which affect the quality of the search. Their optimal values are often found empirically for a given problem Require a starting point (initial configuration), either selected at random or based on other strategies Require a halting condition: e.g., CPU limit, boundary on the quality of the found solution, etc. 26 / 42 Characterization of Metaheuristics (more) In general, they have a stochastic component and can be inspired by natural processes Oftentimes, they have heavy CPU requirements (not cheap!), yet they are easy to implement and can readily be parallalized Classically, they combine two key actions: intensification and diversification, more commonly exploitation and exploration: either we search for more promising solutions in S, else we further analyze a found region of interest Goldend rule: never use metaheuristics if another method can be applied! 27 / 42 1.5 Operating Principles There is a critical need to find a suitable representation of the problem within the given search space S: i.e., how to code the possible solutions A metaheuristic is a trajectory within S. To define this trajectory the motion along it, the following ingredients are required: (1) Neighborhood V (x) for all x ∈ S: these are the y reachable from x. 28 / 42 Operating Principles (more) (2) Search/exploration operator U that selects among the neighbors within the current point along the exploration trajectory U U x0 → x1 ∈ V (x0 ) → x2 ∈ V (x1 )... This process goes on until the halting criterion is activated Depending on the metaheuritic, the output is either: (i) the point x corresponding to the optimal fitness f (x) found along the trajectory, or (2) the last point x along the trajectory 29 / 42 How to define the neighborhood? Find ways to explicitly express V (x) for all x ∈ S: often unpractical if not impossible In general, the neighborhood is specified by means of elementary transforms Ti (or movements) such that Ti (x) ∈ V (x) V (x) = {x ∈ S|x = Ti (x), i = 1,... } 30 / 42 Example #1: Movements in S = Z2 x ∈ S ⇒ x = (i, j) For instance, we can consider for cardinal movements denoted as NORTH, EAST, SOUTH and WEST such that N(i, j) = (i, j + 1) E (i, j) = (i + 1, j) (1) S(i, j) = (i, j + 1) W (i, j) = (i − 1, j) 31 / 42 Example #2: Permutations Let us consider the search space S of permutations of n objects With n = 3, we have A = {(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)} with |A| = n!. One can define a neighborhood of a given permutation by the set of permutations that can be generated through a given operation For instance, transposing (i, j), which consists in swapping positions i and j. 32 / 42 Example #2: Permutations (more) For n = 5, the selected movements are: mi ∈ { (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)} For example, the permutation (a3 , a2 , a4 , a5 , a1 ) admits (a2 , a3 , a4 , a5 , a1 ) as neighbor when applying the movement (1, 2) that transposes the first two elements All permutations are a finite product of transpotisions 33 / 42 Notes on neighborhood Some metaheuristics do not explore the complete neighborhood before moving to the next point Typically, they use a stochastic element to randomly establish the next point, “hoping" that it would yield an optimal value for f in S For metaheuristics conducting an exhaustive neighborhood search, we can define a hierarchical structure of the neighborhood: If nothing useful is found at the 1st level, we move to the 2nd level, and so on so forth; If nothing better is found, we stop the neighborhood search Global vs. local search 34 / 42 Elementary exploration strategies Random walks Neighborhood-dependent random walks Hill climbing Probabiliste hill climbing: probability to select a neighbor p(x) = f (x)/ f (x )... Case of backtracking and branch-and-bound algorithm: these are not metaheuristics (systematic search space exploration) but they share a number of characteristics with metaheuristics: (i) neighborhood selection (children), no specific property required for the fitness function 35 / 42 Iterated Local Search Let us consider a metaheuristic localSearch that gets stuck on local optima. One can construct a more powerful metaheuristic: y’ x’ y fitness search space Reduce S to S that only contains the local optima of S Induced topology on S : two points are neighbors if their basin of attraction remains as such for S In practice, we seek to find the neighbors of x ∈ S by nudging x in a random fashion and by applying again the local search strategy over S to find the neighboring local optimum in S 36 / 42 Energy/fitness Landscape Graphical representation of the fitness given the neighborhood-induced topology A NK fitness landscape 8 fitness 3 0 1023 x (NK-Landscape, N = 10, K = 3, fi contruit aléatoirement) 37 / 42 Fitness smoothing Average fitness 1 f¯ = f (x) (2) |S| x∈S A fitness fλ can be smoothed with a coefficient λ: fλ (x) = f¯ + λ(f (x) − f¯) (3) Smoothed NK Fitness Landscape 6 smoothing=0.4 smoothing=0.8 fitness 2 0 255 x 38 / 42 Fitness landscape and exploration Easy vs. challenging landscape: selection of coding strategy, selection of neighborhood Hard, yet easy case: maximize f (x) = x1 · x2... xn for xi ∈ {0, 1}. The landscape is flat such that nothing can really guide the search Example of MaxOne with two different coding strategies 39 / 42 Example: MaxOne MaxOne with two distinct coding strategies: we interpret (x1...xn ) as an integer. Neighborhood comprises x − 1 et x + 1. If U chooses a neighbor without ever decreasing the fitness, we end up being stuck. However with the hypercube, it works! 4 110 111 010 011 fitness 100 101 000 001 0 0 7 x 40 / 42 Population methods S S x1 x1 x0 y1 x0 z1 y0 x2 z0 y2 z2 x2 41 / 42 Population methods Many metaheuristics consider a population of tentative solutions. Then, they let this population evolve to explore the S: e.g., genetic/evolutionary algorithms The optimal solution is then the best solution found across all generations/evolutions We work in an extended search space: S N where N is the number of individuals within the population The neighborhood of a population is another population: elementary transforms are more complex since they can use the entire population, and not just one single solution It can be seen as some sort of co-evolution of many solutions 42 / 42