GA Slides v1 PDF
Document Details
Uploaded by WinningZeugma5722
Feras Al-Obeidat
Tags
Summary
This document is a presentation on genetic algorithms. It covers the history of evolutionary computation, its biological analogies, advantages, and basic terminology. The presentation explores how genetic algorithms use principles of natural selection to find solutions to complex problems.
Full Transcript
Dr. Feras Al-Obeidat Feras Al-Obeidat. INS646 HISTORY OF EVOLUTIONARY COMPUTATION § Evolutionary computation was first explored as an optimization tool in the 1950s when computer scientists were playing with the idea of applying Darwinian ideas of biological evolution t...
Dr. Feras Al-Obeidat Feras Al-Obeidat. INS646 HISTORY OF EVOLUTIONARY COMPUTATION § Evolutionary computation was first explored as an optimization tool in the 1950s when computer scientists were playing with the idea of applying Darwinian ideas of biological evolution to a population of candidate solutions. § They theorized that it may be possible to § apply evolutionary operators such as crossover – which is an analog to biological reproduction § mutation – which is the process in which new genetic information is added to the genome. § These operators when coupled with selection operation that provide genetic algorithms the ability to “evolve” new solutions when left over a period of time. § In the 1960s “evolution strategies” – an optimization technique applying the ideas of natural selection and evolution - was first proposed by Rechenberg Feras Al-Obeidat. INS646 BIOLOGICALLY ANALOGIES § Holland, J.H. (1975) who first invented and developed the concept of genetic algorithms during the 1960s and 1970s. § He presented his ideas in 1975 in his groundbreaking book, “Adaption in Natural and Artificial Systems”. § Holland’s book demonstrated how Darwinian evolution could be modeled using computers for use in optimization strategies. § His book explained how biological chromosomes can be modeled as strings of 1s and 0s, and how populations of these chromosomes can be “evolved” by implementing techniques that are found in natural selection such as mutation, selection and crossover. Feras Al-Obeidat. INS646 THE ADVANTAGE OF EVOLUTIONARY COMPUTATION § Require an algorithm to search through a huge number of possible solutions in an attempt to find a feasible solution § Depending on the amount of solutions that need to be searched through, classical computational methods may not be able find a feasible solution in the timeframe available – even using a super computer. It’s in these circumstances where evolutionary computation can offer a helping hand. § The complexity of the problem makes it unpractical for a human programmer to solve with code. § optimization and learning algorithms can provide us with a method to use the computer’s processing power to find a solution to the problem itself. § An example of this might be when building a fraud detection system that can recognize fraudulent transactions based on transaction information. Feras Al-Obeidat. INS646 § Evolutionary algorithms are also useful when humans don’t know how to solve a problem. § A classic example of this was when NASA was looking for an antenna design that met all their requirements for a 2006 space mission. NASA wrote a genetic algorithm which evolved an antenna design to meet all of their specific design constraints such as, signal quality, size, weight and cost. § Building an algorithm to make predictions on the stock market. An algorithm that makes accurate predictions about the stock market one week might not make accurate predictions the following week. Evolutionary computation can help accommodate for these changes by providing a method in which adaptations can be made to the prediction algorithm as necessary. Feras Al-Obeidat. INS646 GENETIC ALGORITHMS § If the problem is sufficiently hard to write code to solve § When a human isn’t sure how to solve the problem § When it’s not feasible to search through each possible solution § When a “good-enough” solution is acceptable Feras Al-Obeidat. INS646 § Biological evolution, through the process of natural selection, was first proposed by Charles Darwin (1859) in his book, “The Origin of Species”. § It was his concept of biological evolution which inspired early computer scientists to adapt and use biological evolution as a model for their optimization techniques, found in evolutionary computation algorithms. § All organisms contain DNA which encodes all of the different traits/characteristics that make up that organism. Feras Al-Obeidat. INS646 § Changing the DNA of an organism will change its traits such as eye and hair color § DNA is made up of individual genes, and it is these genes that are responsible for encoding the specific traits of an organism. § An organism’s genes are grouped together in chromosomes and a complete set of chromosomes make up an organism’s genome § for example humans have 46 chromosomes § In genetic algorithms we refer to the chromosome as the candidate solution. This is because genetic algorithms typically use a single chromosome to encode the candidate solution. Feras Al-Obeidat. INS646 § When two organisms mate, DNA from both organisms are brought together and combined in such a way that the resulting organism – usually referred to as the offspring – acquires 50% of its DNA from its first parent, and the other 50% from the second. § a gene from the organisms DNA will mutate providing it with DNA found in neither of its parents. § Mutations provide the population with genetic diversity by adding genes to the population that weren’t available beforehand. § All possible genetic information in the population is referred as the population’s “gene pool”. Feras Al-Obeidat. INS646 § If the resulting organism is fit enough to survive in its environment it’s likely to mate itself, allowing its DNA to continue on into future populations. § If, the resulting organism isn’t fit enough to survive and eventually mate its genetic material won’t propagate into future populations. § Evolution is referred to as survival of the fittest – only the fittest individuals survive and pass on their DNA. It’s this selective pressure that slowly guides evolution to find increasingly fitter and better adapted individuals. Feras Al-Obeidat. INS646 BASIC TERMINOLOGY § Population - a collection of candidate solutions which can have genetic operators such as mutation and crossover applied to them. § Candidate Solution – A possible solution to a given problem. § Gene – The building blocks making up the chromosome. Classically a gene consists of 0 or a 1. § Chromosome – A chromosome is a string of genes. A chromosome defines a specific candidate solution. A typical chromosome with a binary encoding might contain something like, “01101011”. § Mutation – The process in which genes in a candidate solution are randomly altered to create new traits. § Crossover – The process in which chromosomes are combined/mate to create a new candidate solution. This is sometimes referred to as recombination. § Selection – This is the technique of picking candidate solutions to breed the next generation of solutions. § Fitness – A score which measures the extent to which a candidate solution is adapted to suit a given problem. Feras Al-Obeidat. INS646 Parent and Offspring in the Fitness plot Genetic algorithms use a population approach when searching the search space. GA will assume two well ranking solutions can be combined to form fitter offspring. Feras Al-Obeidat. INS646 A Fitness plot showing the Mutation The mutation operator found in genetic algorithms allows us to search the close neighbors of the specific candidate solution. When mutation is applied to a gene its value is randomly changed. Feras Al-Obeidat. INS646 § In the example of both crossover and mutation it is possible to end up with a solution less fit than what we originally set § In such case to be removed from the gene pool during the selection process. Feras Al-Obeidat. INS646 LOCAL OPTIMUMS An obstacle that should be considered when implementing an optimization algorithm is how well the algorithm can escape from locally optimal positions in the search space. A local optimum can be deceiving This problem becomes quickly noticeable when implementing a simple hill climbing algorithm to solve problems of any sufficient complexity. Feras Al-Obeidat. INS646 HILL CLIMBER § Often terminate its search in locally optimal regions of the search space. § Starts at a random point in the search space, then attempts to find a better solution by evaluating its neighbor solutions § When the hill climber can no longer find a better solution it will assume it is at the top of the hill and stop the search. Feras Al-Obeidat. INS646 Sample areas at initialization in GA Genetic algorithms are surprisingly effective at avoiding local optimums and retrieving solutions that are close to optimal. One of the ways it achieves this is by having a population that allows it to sample a large area of the search space locating the best areas to continue the search Feras Al-Obeidat. INS646 Fitness diagram after some Generations have Mutated After a few generations have past, the population will begin to conform towards where the best solutions could be found in the previous generations. This is because less fit solutions will be removed during the selection process making way for new, fitter, solutions to be made during crossover and mutation. The mutation operator also plays a role in evading local optimums. This process will often lead to the discovery of fitter solutions in more optimal areas in the search space. Feras Al-Obeidat. INS646 § Mutation allows a solution to jump from its current position to another position on the search space. This process will often lead to the discovery of fitter solutions in more optimal areas in the search space. § A basic genetic algorithm have a few parameters that need to be considered during the implementation. § The main three are the: § rate of mutation, § the population size and § the third is the crossover rate. Feras Al-Obeidat. INS646 PARAMETERS Although all genetic algorithms are based on the same concepts, their specific implementations can vary quite a 0 0 0 0 0 0 0 bit. 0 1 0 1 0 1 0 One of the ways specific implementations can vary is by 0 0 0 1 1 0 1 their parameters. 0 1 1 0 0 1 0 A basic genetic algorithm will have at least a few parameters that need to be considered during the 1 1 0 1 0 1 1 implementation. 1 1 1 1 1 1 1 Feras Al-Obeidat. INS646 § The population size is simply the number of individuals in the genetic algorithm’s population in any one generation. § The larger the population’s size, the more of the search space the algorithm can sample. This will help lead it in the direction of more accurate, and globally optimal, solutions. § A small population size will often result in the algorithm finding less desirable solutions in locally optimal areas of the search space, however they require less computational resources per generation. § A balance needs to be found for optimum performance of the genetic algorithm. the population size required will change depending on the nature of the problem being solved Feras Al-Obeidat. INS646 GENETIC REPRESENTATIONS Gene 0 0 0 0 0 0 0 Chromosome 0 1 0 1 0 1 0 Population Holland’s (1975) genetic algorithm 0 0 0 1 1 0 1 was based on a binary genetic 0 1 1 0 0 1 0 representation. He proposed using chromosomes 1 1 0 1 0 1 1 that were comprised of strings containing 0s and 1 1 1 1 1 1 1 1s. Feras Al-Obeidat. INS646 § Crossover has an effect on the overall performance of the genetic algorithm. § A high rate allows for many new, potentially superior, solutions to be found during the crossover phase § A lower rate will help keep the genetic information from fitter individuals intact for the next generation. § The crossover rate should usually be set to a reasonably high rate promoting the search for new solutions while allowing a small percentage of the population to be kept unaffected for the next generation. Feras Al-Obeidat. INS646 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 New offspring Feras Al-Obeidat. INS646 § The mutation rate is the probability in which a specific gene in a solution’s chromosome will be mutated § There is technically no correct value for the mutation rate of a genetic algorithm, but some mutation rates will provide vastly better results than others. § A higher mutation rate allows for more genetic diversity in the population and can also help the algorithm avoid local optimums. Feras Al-Obeidat. INS646 § a mutation rate that’s too high can cause too much genetic variation between each generation causing it to lose good solutions it found in its previous population. § If the mutation rate is too low, the algorithm can take long time to move along the search space to find a satisfactory solution. § The mutation rate should be set to a value that allows for enough diversity to prevent the algorithm plateauing, but not so much that it causes the algorithm to lose valuable genetic information from the previous population. § This balance will depend on the nature of the problem being solved. Feras Al-Obeidat. INS646 Before mutation 1 1 1 0 0 0 0 After mutation 1 0 1 0 1 0 0 Mutation occurs to maintain diversity within the population and prevent premature convergence. Feras Al-Obeidat. INS646 Five phases are considered in a genetic algorithm: 1.Initial population 2.Fitness function 3.Selection 4.Crossover 5.Mutation Feras Al-Obeidat. INS646 § The process begins with a set of individuals which is called a Population. § Each individual is a solution to the problem you want to solve. § An individual is characterized by a set of parameters (variables) known as Genes. § Genes are joined into a string to form a Chromosome (solution). § binary values are used (string of 1s and 0s). We say that we encode the genes in a chromosome. Feras Al-Obeidat. INS646 § The fitness function determines how fit an individual is (the ability of an individual to compete with other individuals). § It gives a fitness score to each individual. § The probability that an individual will be selected for reproduction is based on its fitness score. Feras Al-Obeidat. INS646 § The idea of selection phase is to select the fittest individuals and let them pass their genes to the next generation. § Two pairs of individuals (parents) are selected based on their fitness scores. § Individuals with high fitness have more chance to be selected for reproduction. Feras Al-Obeidat. INS646 § Crossover is the most significant phase in a genetic algorithm. § For each pair of parents to be mated, a crossover point is chosen at random from within the genes. § Offspring are created by exchanging the genes of parents among themselves until the crossover point is reached. § The new offspring are added to the population. Feras Al-Obeidat. INS646 § In certain new offspring formed, some of their genes can be subjected to a mutation with a low random probability. This implies that some of the bits in the bit string can be flipped. § Mutation occurs to maintain diversity within the population and prevent premature convergence Feras Al-Obeidat. INS646 § The algorithm terminates if the population has converged (does not produce offspring which are significantly different from the previous generation). § Some typical termination conditions would be: A maximum number of generations is reached Its allocated time limit has been exceeded A solution has been found that meets the required criteria The algorithm has reached a plateau Feras Al-Obeidat. INS646 General Genetic Algorithm Process Feras Al-Obeidat. INS646 A GENERAL GENETIC ALGORITHM PROCESS 1. Genetic algorithms begin by initializing a population of candidate solutions. This is typically done randomly to provide an even coverage of the entire search space. 2. Next, the population is evaluated by assigning a fitness value to each individual in the population. In this stage we would often want to take note of the current fittest solution, and the average fitness of the population. 3. After evaluation, the algorithm decides whether it should terminate the search depending on the termination conditions set. Usually this will be because the algorithm has reached a fixed number of generations or an adequate solution has been found. 4. If the termination condition is not met, the population goes through a selection stage in which individuals from the population are selected based on their fitness score – the higher the fitness, the better chance an individual has of being selected. Feras Al-Obeidat. INS646 5. The next stage is to apply crossover and mutation to the selected n individuals. This stage is where new individuals are created for the next generation. 6. At this point the new population goes back to the evaluation step and the process starts again. We call each cycle of this loop a generation. 7. When the termination condition is finally met, the algorithm will break out of the loop and typically return its finial search results back to the user. Feras Al-Obeidat. INS646 General Genetic Algorithm Process 1. Initializing a random population of candidate solutions. provide an even coverage of the entire search space. 2. Assign a fitness value to each individual in the population. 3. After evaluation, the algorithm decides whether it should terminate the search depending on the termination conditions 4. If the termination condition is not met, the population goes through a selection stage in which individuals from the population are selected based on their fitness score – the higher the fitness, the better chance an individual has of being selected. Feras Al-Obeidat. INS646 General Genetic Algorithm Process 5. Apply crossover and mutation to the selected n individuals. This stage is where new individuals are created for the next generation 6. At this point the new population goes back to the evaluation step and the process starts again. We call each cycle of this loop a generation. 7. When the termination condition is finally met, the algorithm will break out of the loop and typically return its finial search results back to the user. Feras Al-Obeidat. INS646 START Generate the initial population Compute fitness REPEAT Selection Crossover Mutation Compute fitness UNTIL population has converged STOP Feras Al-Obeidat. INS646 INTRODUCTION TO THE WEKA FRAMEWORK One of the handy tools in evaluating various data science algorithms is Weka (Waikato Environment for Knowledge Analysis). This is a suite of machine learning software written in the Java programming language. Weka is very popular since it can be extended to leverage additional algorithms and data mining techniques. In this section, we will be introduced to the generic concepts of Weka and specifically look at using it for the implementation of genetic algorithms. Weka provides a great and intuitive visual user interface for data mining, analysis, and predictive modeling. Some of the features that make Weka a popular choice for the community are the following: Feras Al-Obeidat. INS646 WEKA FRAMEWORK Weka is available as a free tool to use under the GNU General Public license Weka is written in the Java programming language and compiles to byte code, which is easily portable across platforms Weka contains a rich library of machine learning algorithms and it can further be extended within the framework by creating hooks using the simple-to-use APIs The simple-to-use GUI makes it easy to train and compare various classifiers, clusters, and regression outputs Weka supports ARFF (Attribute-Relation File Format), CSV (Comma Separated Values), and data formats for the datasets. Feras Al-Obeidat. INS646 CONCEPTUAL VIEW OF THE WEKA FRAMEWORK: Feras Al-Obeidat. INS646 ARFF FILES % 1. Title: Iris Plants Database % % 2. Sources: % (a) Creator: R.A. Fisher % (b) Donor: Michael Marshall (MARSHALL%[email protected]) % (c) Date: July, 1988 % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} Feras Al-Obeidat. INS646 @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa 5.0,3.6,1.4,0.2,Iris-setosa 5.4,3.9,1.7,0.4,Iris-setosa 4.6,3.4,1.4,0.3,Iris-setosa Feras Al-Obeidat. INS646 WEKA, THERE ARE FIVE POSSIBLE APPLICATIONS TO CHOOSE FROM: Explorer: This application provides an environment for exploring datasets with Weka. Experimenter: An environment for performing experiments and conducting statistical tests between learning schemas. Knowledge Flow: This environment supports the same features as explorer, but with a drag- and-drop interface. It supports incremental learning. Workbench: This is an all-in-one application that combines all the others within the perspectives that the user can select. Simple CLI: Provides a simple command-line interface that allows direct execution of Weka commands for operating systems that do not provide their own command line interface. Feras Al-Obeidat. INS646 Feras Al-Obeidat. INS646 Feras Al-Obeidat. INS646 DIBETES.ARFF Feras Al-Obeidat. INS646 Feras Al-Obeidat. INS646 VISUALIZATION Feras Al-Obeidat. INS646 Feras Al-Obeidat. INS646 Feras Al-Obeidat. INS646 Dr.Feras Al-Obeidat Feras Al-Obeidat. INS646 § Genetic Algorithms in Java Basics by Lee Jacobson and Burak Kanber § Artificial Intelligence for Big Data - Anand Deshpande and Manish Kumar § https://www.oreilly.com/library/view/artificial-intelligence- for/9781788472173/1529077a-062d-4b01-a788-2e4aea3eecc0.xhtml Feras Al-Obeidat. INS646