next up previous contents
Next: Experimental Techniques Up: Action Selection Previous: Deterministic Strategy

Mixed Strategy

Although the deterministic strategy performed well against earlier versions of BPP which have relatively weak opponent modeling capabilities, and also against inexperienced humans. Against a stronger opponent, the likes of an experienced poker player, it is likely that this strategy would be identified and exploited quite easily. Due to the deterministic nature of its play, an intelligent opponent could possibly construct an accurate model of BPP's hand based upon its actions. An opponent could then use this knowledge to their advantage and modify their strategy to maximise profit at the expense of the more predictable player. This lead to the belief that a non-deterministic model would perform better in ``real-world'' play by introducing uncertainty into the minds of its opponents.

The mixed strategy selects an action with some probability based on the EW of the action. This ensured that while most of the time BPP would bet strongly when holding a strong hand and fold on weak hands, it occasionally chooses a sub-optimal action making it difficult for an opponent to accurately construct an accurate model of the strength of BPP's hand. This introduced randomness to it's play without sacrificing too much in the way of performance.

As described earlier, the basic decision facing any poker player is whether to fold or one's hand or not. Betting curves, such as that in Figure 8, are used to randomly select whether BPP folds or not, in a way dependent on the difference between the EW of folding and that of passing. The horizontal axis shows the difference between the EW of folding and passing (scaled by the bet size); the vertical axis is the frequency at which one should fold. Note that when the difference is zero (EW(pass) = 0), BPP will fold randomly half of the time.

Figure 8: Betting curve for folding. The horizontal axis is the difference between the expected utility of passing/calling and folding

Once the frequency for folding has been established, the action of ``not folding'' must be split so that BPP chooses to either pass/call or bet/raise. This is done in a similar fashion to determining the folding probability, calculated using the difference between the EW of betting and passing and can be seen in Figure 9. The vertical axis gives the frequency at which one should call or pass when not folding. Together, these curves give probability for each action and from this distribution BPP randomly selects an action to perform.

Figure 9: Betting curve for passing and betting. The horizontal axis is the difference between the expected utility of betting/raising and passing/calling

The playing curves are generated by the following exponential functions (where fp and cp are parameters adjustable for each round of play):

\begin{displaymath}\mathrm{fold~prob} =
\frac{1}{1 + e^{\left(fp \times \left(\frac{EW(pass) - EW(fold)}{B}\right)\right)}}
\end{displaymath} (8)

\begin{displaymath}\mathrm{pass/call~prob} =
\frac{1}{1 + e^{\left(cp \times \left(\frac {EW(bet) - EW(pass)}{B}\right)\right)}}
\end{displaymath} (9)

Ideal parameters will select the optimal balance between deterministic and randomised play by stretching or squeezing the curves along the horizontal axis. For example, if the curves were stretched heavily along the horizontal axis, totally random action selection would result, since irrespective of the EW for the actions, the curves would select either with probability 0.5. On the other hand, if the curves were squeezed towards the centre of the horizontal axis, a deterministic strategy would be employed with the action with the greatest EW always being selected.

In order to find good parameters, we employed a stochastic greedy search of the parameter space when running against a version of BPP which used betting curves (See Table 5). Since the space being searched is 8 dimensional (two types of curves, one for each of the four rounds of play) with a highly noisy score function (amount of winnings accumulated), it is not evident whether the search for optimality was successful. Nevertheless, the curves produced by the stochastic search appear to provide reasonable results and provide a good camouflage for playing behaviour by their introduction of random play.

The results in Figure 7 show that the implementation of a mixed strategy based upon the decision network produced an extremely significant improvement in play, winning $0.1865 \pm 0.025$ units per game ( $t = 7.4624; p \le 0.025$). Even so, a version of BPP employing a deterministic strategy appears to perform more convincingly. This by no means indicates that a deterministic strategy is the better choice for profitable poker play. On the contrary, it identifies a weakness in BPP's opponent modeling capabilities. Against an experienced human player, we would expect that the mixed strategy would perform better than a deterministic one, but due to the difficulty in obtaining results from games played against human players, a significant number of results have not been collected.

next up previous contents
Next: Experimental Techniques Up: Action Selection Previous: Deterministic Strategy
Jason R Carlton