Smart Systems Lec 6 PDF
Document Details
Uploaded by PermissibleKeytar8288
Alexandria University
Tags
Summary
This document provides lecture notes on genetic programming, outlining its principles, algorithms, and applications. It covers concepts like generating populations of random programs, evaluating their quality, and breeding fitter programs within a given context. The material includes examples and diagrams to illustrate the concepts.
Full Transcript
# Smart Systems and Computational Intelligence ## Lecture 6: Smart Systems and Computational Intelligence Course ### What is Genetic Programming? - Genetic Programming (GP) is an Evolutionary Computation (EC) technique that automatically solves problems without requiring the user to know or specify...
# Smart Systems and Computational Intelligence ## Lecture 6: Smart Systems and Computational Intelligence Course ### What is Genetic Programming? - Genetic Programming (GP) is an Evolutionary Computation (EC) technique that automatically solves problems without requiring the user to know or specify the form or structure of the solution in advance. - At the most abstract level, GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. - It saves time by freeing the human from having to design complex algorithms. - Not only designing the algorithms but creating ones that give optimal solutions ### How is Genetic Programming? - The basic control flow for genetic programming, where survival of the fittest is used to find solutions, is shown below. | | | | |:----------------:|:----------------:|:----------------:| | Generate Population of Random Programs | Run Programs and Evaluate Their Quality | Solution | | | Breed Fitter Programs | (* (SIN (- y x)) (IF (> x 15.43) (+ 2.3787 x) (* (SQRT y) (/ x 7.54)))) | ### Genetic Programming - Since genetic programming manipulates programs by applying genetic operators, a programming language should permit a computer program to be manipulated as data and the newly created data to be executed as a program. For these reasons, LISP was chosen as the main language for genetic programming. - In programming languages such as LISP, the mathematical notation is not written in standard notation, but in prefix notation. Some examples of this: - `+21 : 2+1` - `*+212 : 2*(2+1)` - `*+-2149: 9*((2-1)+4)` - Notice the difference between the left-hand side to the right? Apart from the order being different, there is no parenthesis! The prefix method makes it much easier for programmers and compilers because order precedence is not an issue. - You can build expression trees out of these strings that can be easily evaluated, for example, here are the trees for the above three expressions. - Expression Tree 1: - `+` - `1` - `2` - Expression Tree 2: - `*` - `2` - `+` - `1` - `2` - Expression Tree 3: - `*` - `9` - `+` - `-` - `2` - `1` - `4` - You can see how expression evaluation is thus a lot easier. - What this have to do with GAs? If for example you have numerical data and 'answers', but no expression to conjoin the data with the answers. - A genetic algorithm can be used to 'evolve' an expression tree to create a very close fit to the data. - By 'splicing' and 'grafting' the trees and evaluating the resulting expression with the data and testing it to the answers, the fitness function can return how close the expression is. ### Algorithm - Randomly create an initial population of programs from the available primitives. - Repeat: - Execute each program and ascertain its fitness. - Select one or two program(s) from the population with a probability based on fitness to participate in genetic operations. - Create new individual program(s) by applying genetic operations with specified probabilities. - Until an acceptable solution is found or some other stopping condition is met. - Return the best-so-far individual. ### Tree Structure - Programs are represented as tree. - A tree with the following nodes and depth is shown. - `A` - Depth: 0 - `B` and `C` - Depth: 1 - `D`, `E`, `F` and `G` - Depth: 2 - A tree with the following levels is shown. - Level 0: `4` - Level 1: `2` and `6` - Level 2: `-1-`, `3`, `-5-`, `-7-` - A tree with the following nodes is shown. - `+` - `1` - `2` - `IF` - `>` - `TIME` - `10` - `3` - `4` - `TIME` - `10` - The prefix notation of the above tree is `(+ 1 2 (IF (> TIME 10) 3 4))`. ### Basics #### Terminal - The variables and constants in the program (x, y and 3) which forms leaves of the tree. #### Funtion - The arithmetic operations (+, * and max) are internal nodes called functions. #### Primitive set - Functions and terminals together from primitive set. | Function Set | Terminal Set | |:-------------:|:-------------:| | Kind of Primitive | Kind of Primitive | | Example(s) | Example(s) | | Arithmetic: +, *, / | Variables: x, y | | Mathematical: sin, cos, exp | Constant Values: 3, 0.45 | | Boolean: AND, OR, NOT | 0-arity functions: rand, go_left| | Conditional: IF-THEN-ELSE| | | Looping: FOR, REPEAT| | ### Initialization of population - **Full Method:** Selecting internal nodes from function set until maximum depth is reached. At the end, select node from terminal set. All leaves are at the same level. - **Grow Method:** Select nodes from the primitive set until maximum depth is reached. - **Ramped half-and-half:** Half the initial population is constructed using full and half is constructed using grow. Depth limits of trees not fixed. ### The Grow Method - Trees generated are of a variable size. - Labels for nodes at levels less than the maximum depth are chosen from both the function and terminal sets. - Labels for nodes at the maximum depth are chosen from the terminal set. - Three example trees are shown, with the following nodes in them: - Tree 1: - `1` - `*` - `x` - `y` - Tree 2: - `+` - `x` - `y` - Tree 3: - `*` - `+` - `x` - `y` - `1` ### The Full Method - Every tree has the same shape and size. - Labels for nodes at a depth less than the maximum depth are chosen from the function set only. - Labels for nodes at the maximum depth are chosen from the terminal set. - Two example trees are shown with the following nodes in them. - Tree 1: - `*` - `+` - `x` - `y` - `+` - `x` - `y` - Tree 2: - `+` - `*` - `x` - `y` - `*` - `x` - `y` ### Ramped Half-and-Half - Produces trees of various shapes and sizes. - Combines both the full and grow methods. - For each depth value half, the trees will be generated using the full method and the other half using the grow method. - Produces a diverse population. ### Initialization of population (Full method) <start_of_image> Diagrams are shown with a full tree of depth 2 created using the full method. (t=time) - t=1: `+` - t=2: `*` - `+` - t=3: `*` - `+` - `x` - t=4: - `+` - `x` - `y` - t=5: `*` - `+` - `x` - `y` - t=6: `*` - `+` - `x` - `y` - t=7: `*` - `+` - `x` - `y` - `0` - A series of snapshots of the construction of a full tree of depth 2 are shown. - The children of the * and / nodes must be leaves or otherwise the tree would be too deep. Thus, at both steps t = 3, t = 4, t = 6 and t = 7 a terminal must be chosen (x, y, 1 and 0, respectively). ### Initialization of population (Grow method) - Creation of a five-node tree using the grow initialization method with a maximum depth of 2 is shown. - (t = time). - t=1: `+` - t=2: `+` - `x` - t=3: `+` - `x` - t=4: `+` - `x` - `-` - `2` - t=5: `+` - `x` - `-` - `2` - `y` - A terminal is chosen at t = 2, causing the left branch of the root to be closed at that point even though the maximum depth had not been reached. ### Selection Strategies - **Tournament Selection:** Best individual from a set of randomly selected individuals is chosen to be parent. - **Advantage:** It rescales fitness among the population. A best program doesn't populate the next generation with its children reducing diversity. - **Disadvantage:** A small difference in fitness gets amplified in next generation. Even though it is just marginally better than its competitior. - **Fitness-Proportionate Selection** also known as roulette wheel selection: Select individuals based on their fitness values. ### Breeding Next Generation - Crossover, mutation and reproduction are main operators applied on parents to generate offsprings. - **Subtree Crossover:** Create the offspring by replacing the subtree rooted at the crossover point in a copy of the first parent with a copy of the subtree rooted at the crossover point in the second parent. - **Reproduction:** Simply copy an elite individual to next generation. Ensures passing of good genes. ### Subtree Crossover - Diagrams show two parent trees. - The crossover point and the subtree used for the crossover. - Diagrams show the created offspring trees. - The Parent trees are: - Tree 1: - `+` - `+` - `x` - `y` - `3` - `(x+y)+3` - Tree 2: - `+` - `y` - `*` - `1` - `2` - `(y+1) * (x/2)` - The Offspring trees are: - Tree 1: - `+` - `*` - `x` - `2` - `3` - `(x/2) + 3` - Tree 2: - `GARBAGE` ### Mutation - **Subtree Mutation:** Replace the mutation point by randomly generated tree. - **Point Mutation:** Randomly select a node and replace it with another primitive of the same arity. - Diagrams show a parent tree. - Diagrams show the mutation point, and the randomly generated subtree, used for the mutation. - Diagrams show the created offspring tree. - The Parent tree is: - `+` - `x` - `y` - `3` - The Offspring tree is: - `+` - `x` - `y` - `*` - `y` - `x` - `2` ### Fitness Function - Gives a measure of how good a computer program is at solving the given problem. - Proper encoding of problem through fitness functions is important for GP's success. - Measuring fitness of a given program means executing the program. - Evaluation of program tree is done in depth first search order. ### Fitness Function - The first two preparatory steps define the primitive set for GP, and therefore indirectly define the search space GP will explore, - This includes all the programs that can be constructed by composing the primitives in all possible ways, - However, at this stage, we still do not know which elements or regions of this search space are good i.e., which regions of the search space include programs that solve, or approximately solve, the problem - The most difficult and most important concept of genetic programming is the fitness function. - The fitness function determines how well a program is able to solve the problem and varies greatly from one type of program to the next. - For example, if one were to create a genetic program to set the time of a clock, the fitness function would simply be the amount of time that the clock is wrong. ### Fitness Function - Fitness can be measured in many ways, for example, in terms of: - the amount of error between its output and the desired output, - the amount of time (fuel, money, etc.) required to bring a system to a desired target state, - the accuracy of the program in recognizing patterns or classifying objects, - the payoff that a game-playing program produces, - the compliance of a structure with user-specified design criteria ### Five Preparatory steps to set up GP | | | | | | |:-------------:|:-------------:|:-------------:|:-------------:|:-------------:| | Terminal Set | Function Set | Fitness Measure | Parameters | Termination Criterion and Result Designation | | | | | | | - **GP: A Computer Program** - Determining the set T of terminals. - Determining the set F of functions. - Determining the fitness measures. - Determining the GP parameters. - Determining the Termination criteria and result designation ### Areas where GP will do well - The interrelationships among the relevant variables are unknown or poorly understood. - Finding the size and shape of the ultimate solution is a major part of the problem. - Search domain is huge and solutions are sparsely distributed. - There are good simulators to test the performance of tentative solutions to a problem, but poor methods to directly obtain good solutions. - Genetic programming can be used in stock market prediction, advanced mathematics, and military applications. ### Genetic Programming - The limitations of genetic programming lie in the huge search space the GAs have to search for - an infinite number of equations. Therefore, before running a GA to search for an equation, the user tells the program which operators and numerical ranges to search under. - A disadvantage exists for GP trees which are unable to grow effectively. GP trees that grow sparse and unbalanced will cause more code growth, less genotypic diversity, and search a smaller space of possible programs. These populations will be less effective in the evolutionary process. ### Any Questions??? ### Thanks