GP implementation

This lecture will illustrate how to implement GP. It will show the general structure of a GP program. The specific detail of the problem to be solved will be avoid. The code will be explained in a top-down fashion. First, let's look at some assumption. The terminal set is dependent on the problem. Let the function set be {if-and, if-or, if-not} with arity 4,4,3. These functions have their usual semantic. The individual is indexed by pgm[i]. Let the population size be POP. The selection is elitism, selecting the NBEST to be parents. Now let's look at the main loop : run the experiment for G generations.

PRIVATE runexperiment( int G)
{
gen = 0;
while( gen++ < G ) {
    for( i=0; i<POP; i++
        pfit[i] = evalfitness( pgm[i] );
    select(NBEST);
    if(gen == G) return;
    docrossover();
    for(i=PCROSS; i<POP; i++)
        pgm[i+200] = mutate( copyprog(pgm[i]));
}
}

The first for loop evaluate all individual. The select() stores the NBEST individual at the index 0..NBEST-1. The docrossover() will randomly select parents from the first NBEST individuals with replacement. It checks that both parents are different. Both offspring are kept.

PRIVATE docrossover()
{
for( k=NBEST; k<PCROSS; k+=2 ) {
    r = myrnd(NBEST);
    do { s = myrnd(NBEST); } while( s == r );
    ap1 = copyprog(pgm[r]);
    ap2 = copyprog(pgm[s]);
    crossover(ap1,ap2);
    pgm[k] = ap1;
    pgm[k+1] = ap2;
}
}

The mutate() replaces a random subtree in an individual with a small newly generated random tree. Findnode() returns the pointer to a particular node indexed by the node numbering. A cell[] is the data structure of a node. The expression 'myrnd ( proglen (ap-1) + 1' generates the node number which is any node except the root node. The line ' cell[a] = ap2 ' replaces a node in the original tree with the new tree.

int mutate( int ap)
{
ap2 = genprog( 2 );
i = myrnd(proglen(ap) -1) +1;
a = ap;
if( findnode(ap, &i, &a) ) cell[a] = ap2;
return ap;
}

The next piece of program evaluates a tree, i.e. 'executing the tree'. It is actually an interpreter of the 'language' that composed of a given symbol set. The first group of switch() checks and runs a terminal. The meaning of a terminal is dependent on the problem. The meaning of functions are defined in the second switch(), cmd() extracts the function symbol from a node, arg1..4() extracts the corresponding argument of a function. The interpreter is recursive which is natural for this kind of task. Please takes note that in all cases 'eval' must return value. This is how the function set is closed.

int evalprog( int ap)
{
if( isterm(ap) ) {
    switch( ap ) {
    case SP : . . . return 1;
    case SM : . . . return 1;
    . . .
    }
    return 0;
}
switch( cmd(ap) ) {
case IFAND :
    s1 = evalprog(arg1(ap));
    s2 = evalprog(arg2(ap));
    if( s1 && s2 ) return evalprog(arg3(ap));
    else return evalprog(arg4(ap));
case IFOR :
    s1 = evalprog(arg1(ap));
    s2 = evalprog(arg2(ap));
    if( s1 || s2 ) return evalprog(arg3(ap));
    else return evalprog(arg4(ap));
case IFNOT :
    if( ! evalprog(arg1(ap)) ) return evalprog(arg2(ap));
    else return evalprog(arg3(ap));
}
return 0;
}

Next, let's look into details that involve data structure of an individual. First, how to generate a random individual. An individual is a tree, each node is a 'cell' which has 5 fields, the first field stores a functions, the other four fields store either a terminal or a pointer to a subtree. A symbol is numbering 0..ALLSYM-1. A terminal can be distinguished from a pointer by the convention that a pointer is
always bigger than ALLSYM. getacell() allocates a cell from the storage pool. A random tree is generated by filling in each field and expanded subtree. The genprog() is controlled by 'depth' which limits the size of the generated tree. When the depth is zero the tree will not be expanded further. is4arg() is a predicate returns yes if the function in that node has 4 arguments ( if-and, if-or ).

PRIVATE int genprog( int depth)
{
if ( depth == 0 ) return myrnd(TERM); /* terminals only */
c = myrnd(ALLSYM);
if ( isterm(c) ) return c;
ap = getacell();
cmd(ap) = c;
arg1(ap) = genprog(depth-1);
arg2(ap) = genprog(depth-1);
arg3(ap) = genprog(depth-1);
if( is4arg(cmd(ap)) ) arg4(ap) = genprog(depth-1);
return ap;
}

Lastly, the following code that supports read and write a tree in a linearize form to a text file. It shows how to traverse a tree recursively. The text form of a tree is the mapping from 0.. ALLSYM-1 to 'a' . . . Because the order of traversal is known and the arity of each function is known, the inverse ransform from text to tree is possible.

tofile( int ap, FILE *fp) /* output a text form of a program */
{
if ( ap == NIL ) return;
if (isterm(ap)) {
fputc(ap+'a',fp);
return;
}
fputc(cmd(ap)+'a',fp);
tofile(arg1(ap),fp);
tofile(arg2(ap),fp);
tofile(arg3(ap),fp);
if( is4arg(cmd(ap)) ) tofile(arg4(ap),fp);
}
int fromfile( FILE *fp)
{
c = fgetc(fp);
if ( c == EOF || c < 'a' || c > 'n' ) return NIL;
c -= 'a';
if ( isterm(c) ) return c;
ap = getacell();
cmd(ap) = c;
arg1(ap) = fromfile(fp);
arg2(ap) = fromfile(fp);
arg3(ap) = fromfile(fp);
if( is4arg(cmd(ap)) ) arg4(ap) = fromfile(fp);
return ap;
}

Here is the example of the output (formatted to fit page) :
nfbngngmmjgbflllldijnbiniedjilcnlfgajljfbgddijnggkanmg
ljagkddaknihdghgef

Some question, is this traversal a 'depth-first' or 'breadth-first'?