Bayesian Optimisation Algorithm (BOA) is recent addition to the Estimation of Distribution family of algorithms. BOA uses Bayesian networks to model promising solutions and subsequently guide the further search. The initial population can be generated randomly or using some heuristic. At each generation, a set of promising solutions is selected from the population. The criteria for the selection is equivalent to a Genetic Algorithm fitness function. Following the selection of promising strings (solutions), a Bayesian network that fits the selected set is constructed. Any metric can be used to as a measure of quality of the network and any search algorithm can be used to search over the networks in order to optimise the value of the used metric. The creators of BOA have used the Bayesian Dirichlet (BD) metric to evaluate the network. To search over the networks they typically used a simple greedy algorithm that at each step allows a single arc addition to the current network, and can be extended to allow arc removals and reversals. To cut down on computational complexity, the number of allowed incoming edges (arcs) to each node (variable) is usually given a maximum value, typically 1 or 2. In addition, prior information about the problem, for example information about the network in the previous generation, can be used in order to improve the estimation and improve convergence. A number of new strings are generated according to the joint distribution encoded by the Bayesian network constructed. Finally, the new strings are placed back into the population, replacing some of the old strings. The algorithm typically terminates after a given number of generations.
BOA so far has not been used for solving job scheduling problems, but it can easily be applied to them. The strings in the population can represent whole schedules, with each number in the string representing a job ID assigned to that position. However, there are a number of potential problems relating to network construction and evaluation. The BD metric assumes that the data set is complete, and with an optimisation problem such as job scheduling this will not be the case (otherwise the optimal solution would already be in the population). In addition, the BD metric is cubic in the number of values a variable can take. This becomes very expensive for job scheduling problems, as each variable can take the value of any job ID, so network evaluation becomes cubic in the number of jobs.