Introduction to Genetic Algorithms

Lecture 10  Design and Analysis of Algorithms 2002
Prabhas Chongstitvatana  12 September 2002
Motivation
What is Genetic Algorithms
Representation
Evaluation
Selection
Recombination
Mutation
Schema theory
Summary
References

Motivation

How to write a program to solve a hard problem?  For a large class of problems, there are known algorithms to solve them.  Many of these algorithms have analysis of their complexity, hence we can choose the appropriate algorithm for the problem.  However, many interesting problems do not have any known algorithm to solve them.  Sometimes an obvious algorithm to solve the problem is not effective (taking too long time for the size of problem interested).  In this case, if the problem is well studied, the knowledge about its structure can be exploited to solve it.  This class of problem is suitable for heuristics.  Problems in artificial intelligence (the science of studying how to make a machine think) belongs to this class.  Heuristics is not easy (as anybody who work in AI will tell you).  A lot of time it is also not obvious.  It is often the case that only experts will be able to come up with good heuristics for the problem.  If you are not an expert in that particular problem which has no known algorithm to solve it, what will you do?

A class of algorithms named Evolutionary Algorithms (EA) is suitable for such situation.  EA has been evolving to solve many hard real world problems since 1960.  Large body of work has been accumulated in the past 20 years.  This class of algorithms employs randomization.  The most well known method in this class is Genetic Algorithms (GA) invented by John Holland of University of Michigan in 1972.  The idea behind GA is very elegant.  The whole process can be explained quite easily to any non-expert.   However, it is not easy to explain why it does work.

EA belongs to weak search methods.  The algorithm (in its general form) is not dependent to the problem.  That means it will work for any problem.  (You should be skeptic that it should not work well for all of them!  That is the reason why it is call weak search method).  As opposed to strong search methods that are problem dependent.  In general, for a problem that exists a strong search method, a weak search method will not be as efficient (but the weak search method should be easier to set up).  There is a theory called No Free Lunch (NFL) states that it is not possible to find an algorithm that works well for all problems (seem obvious! but nonetheless).

What is Genetic Algorithms

GA is a search method inspired by natural evolution.  Charles Darwin stated in his book "Origin of Species" (18XX) that "the strongest animal will survive and has offspring, the weak animal will die" < *** must use the exact quote >  GA search method is based on population of solutions.  Initially, these solutions are random, the solutions are evaluated for their "fitness" (their merits of being closed to the answer).  The "good" solutions will be selected to be parents of the next generation.  These selected parents will be improved using so called genetic operators and the results become the next generation population.  This cycle is repeated until the answer is found or some criteria is met (the best solution is good enough or the time limit is exceeded).
solutions <--
  |          |
  v          |
evaluation   |
  |          |
  v          |
improvement -
Genetic operators are: selection, recombination and mutation.  There are many variation of these operators. The evalution depends on the problem.  Sometimes the evaluation is accurate, sometimes it can only give an approximation or even a guidance how a solution compared to others.  This is the only part of the algorithm that depends on the problem.

To see how GA solves a problem, the starting point is how to encode solutions in population.  I will explain a particular method in GA called Simple Genetic Algorithm (SGA)(as explained in Goldberg 1982)  which is a reference method in this field.

Representation

A solution is represented by a binary string {0,1}*.  The length of string depends on the problem.  For example, for a sorting problem, let each element be in the range 0..255 and the number of element is 100 then a solution can be represented by a binary string of 8-bit x 100 = 800 bits.  For an 8-queen problem which encodes a solution as a vector of column number (such as 1,3,7,6,2,5,4,8 ), a solution can be encoded as 3-bit x 8 = 24 bits.  For a subset selection problem, a solution is already a binary string (1-select, 0-not select).  The length of solution affects the time to find the answer.  It is conjectured that for SGA, the running time is in big-oh(exp the length of solution).  < *** must check this fact and give reference>.

Evaluation

The evaluation of the goodness of a solution depends on the problem.  For a sorting of ascending order, the number of times that two numbers is in ascending order can be used.  The higher number the better, for n elements the highest score is n.  The number of attacking queen can be used for the 8-queen problem.  In this case, the lower number is better, the best is 0.

Selection

The selection in SGA is called fitness proportion selection.  Each solution is selected according a probability depends on its goodness.  There are many ways to scale this probability distribution.

Recombination

The recombination operator works to combine two solutions to form two new solutions.  (or mixing the gene).  In SGA, the recombination is one-point crossover.  It works by selecting a random point in the string, cut both parents at that point and exchange the substring.  For example:
123|4567   ==>     123|defg
abc|defg           abc|4567
It is conjectured that there are substring of solution that is good (they are in the answer), they are called Building Block.  The recombination operator "assembles" them together.  This conjecture is called Building Block Hypothesis (BBH).  <give reference>

Mutation

The mutation operator in SGA is called a single-point mutation.  It works by flipping one random bit in a solution.  The mutation introduces new gene into the gene pool.  This is to avoid getting stuck at "local minima", that is, when a search can be proceed any further due to no local improvement is possible.  Mutation affects a kind of "jump" out of it.  Mutation is nature almost always fatal.  However, nature has protection against mutation using redundancy, not all genomes are expressed (taking the effect).  When mutation occurs at non-expressed gene it is not fatal.

A model of behavior of SGA: Schema theory

To understand GA as a dynamic process, it can be modelled by considering the change in the number of gene through time.  Schema theory offers such explanation.  A schema is a pattern {0,1,*}* when '*' means either 0 or 1.  A schema captures the concept of set of strings, for example, the schema '1*****' is a set of all strings of lenght 6 that begin with 1.  The schema '**100' is a set of all strings of length 5 that end with 100.  The bit that is 0 or 1 is called "fixed". The number of fixed-bit is defined as the order of schema, oh(.), for example oh('1**0') = 2.  The length of a schema is the difference between the first fixed-bit and the last fixed-bit, del(.), for example del('*1001') = 3, del('0***') = 0.  The number of bit string represented by a schema is denoted by m(H) where H is a schema.  Schema theory captures the change of m(H) by the genetic operators.  First, consider only the selection.

m(H,t+1) >= m(H,t) f(H)/f*    (1)

where f is the fitness of H and f* is the average fitness of the whole population.  If the fitness of a schema is higher than the average then its number will increase in the next generation and vice versa.  To incorporate the recombination operator into the equation, consider the destructive effect of the one-point crossover.  If the crossover point is "in" the schema that schema will be destroyed.  The probability of this to happen is:

del(H)/(l-1)

where  l is the length of schema

For example if all bit is fixed, the probability of being destroy is 1 (always).  If one bit is fixed the probability of being destroy is 0 (never).  The larger (higher del) schema has more chance of being destroyed.  Of course, it is possible that the recombination operator will create a new string of a schema.  However, it is of second-order effect and we ignore it here.  More exact analysis is possible but not in the scope of this course.  Incorporate this into eq.1  using Pc as probability of using recombination operator.

m(H,t+1) >= m(H,t) f(H)/f* Pc [1 - del(H)/(l-1) ]   (2)

Now consider the effect of mutation.  For one-bit mutation the probability of a schema being destroy depends on its order oh(H).  The more fixed-bit the higher chance that the schema will be destroy by mutation.  Pm is probability of using mutation operator.

m(H,t+1) >= m(H,t) f(H)/f* Pc [1 - del(H)/(l-1) - Pm oh(H)]   (3)

This final eq.3 shows that a short (small del), fitter than average schema will survive and increase its number exponentially.  (it is exponential because the increase is similar to compound interest, it is increasing by (1 + p)^t where p is the proportion of increment 0..1 and t is the number of iteration).  Such schema is the Building Block.  There is another analysis in game theory showing that exponential increasing is the best policy for the problem of exploration vs exploitation.  In the context of GA, initially the effort is spending is exploration, to find out which schema is good.  Then, the exploitation is followed by using that schema over and over to gain the fitness.

Please note that eq. 3 is equality.  It is not exact because there are other second-order effects that are not considered.  Presently, the exact schema theory for SGA exists.  <ref. Vose text book>

Summary

In summary, GA is one answer for solving a hard problem that no known algorithm exists.  It is also appropriate to use when no expert is available to adjust complex parameters required by heuristic algorithms for that problem.  GA is also easy to implement, as in general form it is so simple and can be understood by non computer science person.  However, GA usually consumes large computation resource and not guarantee to give an answer.  GA is suitable for two kinds of task:
  1. innovation  -- for the problem in design, to find the best combination of components given constraints.
  2. optimization -- this is largely in the domain where many optimization exist for problems, but there is always new problem that has no other technique available.
The innovation aspect is quite intriguing and will be the main force for using GA and EA in general in the future.

Future revision:  To be added
hyperplane explanation of search behaviour of GA, some graphical interpretation, implicit parallelism, K-arm bandit theory.

References

(to be added)

14 September 2002
P. Chongstitvatana