The mixed strategy selects an action with some probability based on the EW of the action. This ensured that while most of the time BPP would bet strongly when holding a strong hand and fold on weak hands, it occasionally chooses a sub-optimal action making it difficult for an opponent to accurately construct an accurate model of the strength of BPP's hand. This introduced randomness to it's play without sacrificing too much in the way of performance.
As described earlier, the basic decision facing any poker player is whether to fold or one's hand or not. Betting curves, such as that in Figure 8, are used to randomly select whether BPP folds or not, in a way dependent on the difference between the EW of folding and that of passing. The horizontal axis shows the difference between the EW of folding and passing (scaled by the bet size); the vertical axis is the frequency at which one should fold. Note that when the difference is zero (EW(pass) = 0), BPP will fold randomly half of the time.
Once the frequency for folding has been established, the action of ``not folding'' must be split so that BPP chooses to either pass/call or bet/raise. This is done in a similar fashion to determining the folding probability, calculated using the difference between the EW of betting and passing and can be seen in Figure 9. The vertical axis gives the frequency at which one should call or pass when not folding. Together, these curves give probability for each action and from this distribution BPP randomly selects an action to perform.
The playing curves are generated by the following exponential functions (where fp and cp are parameters adjustable for each round of play):
Ideal parameters will select the optimal balance between deterministic and randomised play by stretching or squeezing the curves along the horizontal axis. For example, if the curves were stretched heavily along the horizontal axis, totally random action selection would result, since irrespective of the EW for the actions, the curves would select either with probability 0.5. On the other hand, if the curves were squeezed towards the centre of the horizontal axis, a deterministic strategy would be employed with the action with the greatest EW always being selected.
In order to find good parameters, we employed a stochastic greedy search of the parameter space when running against a version of BPP which used betting curves (See Table 5). Since the space being searched is 8 dimensional (two types of curves, one for each of the four rounds of play) with a highly noisy score function (amount of winnings accumulated), it is not evident whether the search for optimality was successful. Nevertheless, the curves produced by the stochastic search appear to provide reasonable results and provide a good camouflage for playing behaviour by their introduction of random play.
The results in Figure 7 show that the implementation of a mixed strategy based upon the decision network produced an extremely significant improvement in play, winning units per game ( ). Even so, a version of BPP employing a deterministic strategy appears to perform more convincingly. This by no means indicates that a deterministic strategy is the better choice for profitable poker play. On the contrary, it identifies a weakness in BPP's opponent modeling capabilities. Against an experienced human player, we would expect that the mixed strategy would perform better than a deterministic one, but due to the difficulty in obtaining results from games played against human players, a significant number of results have not been collected.