Genetic Algorithm (GA) is a class of methods that search for solutions by
iteatively improving the set of solutions at hands. The initial set of
solutions can be randomly generated. It can be applied to a large class
of problems.
GA is motivated by natural genetics. It "evolves" set of solutions by
"selecting" good solutions and "recombining" them to create better
solutions. The guideline for selection is based on "fitness" measure.
Mutation can be introduced after the recombination.
To illustrate a concrete algorithm, here is a version of GA called Simple
Genetic Algorithm:
GA searches by jumping through hyperplane determined by the "schema". A
schema represents a group of solution. A schema consists of {0,1,*} where
* is a don't care symbol represented both 0 and 1. For example, a set of
solution of length 4 begin with 1 is written as 1***.
To run a GA, the "fitness" function must be defined. This is
problem-dependent. This function is a measure that distinguish the better
solution in the population. The following example demonstates how to use
GA.
Let use a 32-bit binary string to represent a solution. Use Simple GA,
population size = 1000, recombination 80%, mutation 1%, the number of
generation before giving up is 200. The f(x) can be used as the fitness
function. This is a minimization problem (find x such that f(x) = 0), the
lower the fitness, the better.
The simplest kind of selection is a fitness-proportion. The chance of a
solution being selected depends of its fitness. There are many other
methods such as Ranking where the whole population is sorted according to
the fitness value then assigning the probability to this rank. Tournament
selection is another method. Two candidates are sampling from the
population and the fitter one will be select. The size of the tournament
can be larger than 2. The larger size has more "selection pressure".
Diversity can also be influent in the selection method. By combining
diversity measure with fitness, the selection will pick the solution that
is both higly fit and is different from other in the set.
Beside single-point crossover, multiple point is also possible. Multiple
parents can also be used. Any recombination method must consider that it
must produce only "valid" offspring. For example, on Travelling Saleman
Problem (TSP) the "tour" is represented as a string of numbers that are
unique (no duplicate in the same string). This representation reflects
the nature of the problem that it is "combinational" (that is, the
solution is a permutation of these numbers). Therefore using a normal
single-point crossover will produce "invalid" string. See the follow
pictures:
tour 1: 12345678
tour 2: 45673218
If they are crossed at the middle (single-point crossover) the offspring
will be
tour 1': 1234|3218
tour 2': 4567|5678
and they are both invalid. This kind of problem needs special crossover
operators such as OCX, PMX that preserve the tour. (search the literature
for more details).
(coming soon)
last update 28 Nov 2012