
There are many models used to parallelise a GA over a network of machines.
Global - parallelised evaluation and possibly genetic operators
In this class of parallel GA (PGA) the population remains global, and the genetic operators are performed across the entire population in parallel. A master processor maintains the GA population and sends portions out to slaves for reproduction, mutation or fitness assessment. This generally requires a highly connected network to be effective because of the high level of communication required for the transfer of chromosomes. It also requires that the master processor be able to manage the entire population. For large population GAs this can require an impractical amount of memory (assuming that the entire population will be stored in memory). The important feature of this class is that their behaviour is the same as a sequential GA.
Island - evolve separate populations (demes) on each processor
In this model, the population is divided into multiple demes that evolve isolated from each other most of the time, but exchange individuals occasionally with their neighbours (migration). This type of model has minimal inter-processor communications compared to parallelised genetic operators. There are less memory problems compared to the first model because the memory requirements are distributed amongst the processors. Migration in an island model GA is controlled by several parameters:
the topology that defines the connections between the demes and the destination of the migrants a migration rate that controls how many individuals migrate a migration interval that affects how often the migrations occur
The main problem with this model is the introduction of more parameters. These parameters greatly affect the performance of the GA, and are not well understood.
It is quite simple to produce a parametric GA that maps onto both the global and island models. This is achieved by splitting the population up into a set of files called batches. Each batch can then be sent off to a node, loaded by the GA, and evaluated for x generations. Once evaluated the results are returned to the root node, and a population mixing program invoked to recombine the chromosomes and produce new batches. The behaviour of the GA will depend on the value of x and the logic used to recombine the chromosomes. For example if we use values of x which are greater than 1, and a random recombination algorithm then the GA is operating as an island model GA with a fully connected topology and migration interval of x. For this project a value of 1 was used for x, with a uniform random recombination algorithm. This is equivalent to the global model and has the advantage that the behaviour has not been changed from that of a non-parallel GA. This is important because it means we can apply the large body of research on non-parallel GAs to our parallel GA. We do not have to worry about the memory problem that can occur in global GAs because our population is stored on disk. What we do have to be aware of is that for the parametric GA to perform well, we must have an objective function that is sufficiently complex to yield a high computation to communication ratio. In the case of this project this is not a problem.
The original GA was written in Fortran 77. The code was easily adapted to Clustor since all setting are read from a "ga.conf" file, and the GA population is written to file after each generation and can be reloaded on startup. A program was written to create the initial random population, and also to recombine the population after each generation. A Clustor plan file was created by hand with a single random parameter called RandSeed. This is later substituted into the ga.conf file, and is used to seed the random number generator of the GA. A shell script was written to glue everything together:
#!/bin/sh
declare no_exit_on_failed_exec
startgen=`head gennum`
endgen=10000
rm -fr gen.$startgen 2 /dev/null
d() {
$*
if [ $? -ne 0 ]; then
echo The command "$*" failed, retrying...
sleep 10
d $*
fi;
}
runplan() {
# Must regenerate each time to create new RandomSeed values.
d "atcgenerator -g f111-34.plan"
d "atcdispatcher -b f111-34.run"
}
i=$startgen;
while [ $i -le $endgen ]; do
d "echo Starting generation $i"
d "date"
d "mkdir gen.$i"
d "rm -f gen.$i/*"
runplan
d "mv output.* gen.$i"
d "mv stats.* gen.$i"
tail -q --lines=-1 gen.$i/stats.* | sort +2 summary.txt
mail -s "34 gen:$i" leighf@fangorn < summary.txt
mv summary.txt gen.$i
d "echo Finished generation $i"
d "date"
d "echo Calling createpop mix"
d "./createpop mix 64 256 832"
d "echo Createpop mix finished"
d "date"
gzip -r gen.$i &
oi=$i
i=`expr $oi + 1`
while [ "$i" = "" ]; do
echo Failed to increment i, retrying...
sleep 1
i=`expr $oi + 1`
done;
echo $i gennum
done;
As can be seen in the above script, the Clustor run file is regenerated for each generation. This is necessary to create different values for the random RandSeed parameter. You will also note that the script performs some excessive error checking. This was necessary to stop the script from bailing out due to a lack of resources on the root node during times of high user load.
A Clustor API now exists which lets you control the activities of the dispatcher. This could be used instead of restarting the dispatcher each generation as in the script above.
A flow diagram of the parametric parallel GA follows:

|
|