next up previous contents
Next: Deterministic Strategy Up: Betting Strategy Previous: Bluffing

Action Selection

A weakness of the previous version of BPP is the approximations used in its betting strategy based upon pot odds and betting curves. It was decided to remove some of the expert knowledge evident in this betting strategy and model the expected winnings for each action explicitly within the Bayesian belief networks. In the original version of BPP the Bayesian network were used to calculate the belief that BPP would win at a showdown. The use of the Bayesian network in selecting an action has been taken a step further, with the addition of a decision and utility node to obtain a decision network (Figure 6). Decision networks can be used to find the optimal decisions which will maximise an expected utility. In this case, the decision node represents the possible decisions BPP can make, the possible betting actions ( BET, PASS, FOLD), while the utility we wish to maximise is the amount of winnings BPP can accumulate.


  
Figure 6: Bayesian Decision Network
\includegraphics[width=80mm]{best.eps}

The utility node, Winnings, measures the amount of winnings BPP expects to make based on the possible combinations of the states of the parent nodes (BPP Win and BPP Action). For example, If BPP decided to fold with its next action, irrespective of whether or not it would have won at a showdown, the expected future winnings will always be zero as there is no possibility of future loss or gain in the current game. Where as, if BPP had decided to bet and it were to win at a showdown, it would make a profit equal to the size of the final pot F', minus any future contribution made on its behalf b'. If BPP bet and lost, it would make a loss equal to any future contribution it made towards the final pot -b'. A similar situation occurs when BPP decides to pass, only with differing expected final contribution b and pot F.

This information is represented as a utility table within the Winnings node and can be seen in Table 3.


 
Table 3: Utility table
BPP Action BPP Win Utility
Bet Win F' - b'
Bet Lose -b'
Pass Win F - b
Pass Lose -b
Fold Win 0
Fold Lose 0
 

The amount of winnings that can be made by BPP is dependent upon a number of factors including the number of betting rounds remaining R, the size of the betting unit B, and the current pot c.

Any future contributions to the pot by BPP can be calculated based on whether the action to perform is a bet or a pass (excluding folding) and for each case it is assumed to have a different expected value, b' when BPP chooses to bet, and b when passing. BPP's expected future contribution can be calculated in two parts. Firstly, we can consider just the contribution to be made in the current betting round. This is then added to any contribution we expect BPP to make in future rounds up until play reaches a showdown to arrive at the expected final contribution.

The expected contribution in the current round depends on whether BPP is required to call an opponent's bet and if BPP is betting or passing. The cost of calling a bet, $\delta$, depends on the last action performed by the opponent. If the opponent bet or raised, $\delta$ will be equal to the size of one betting unit B (calling the opponent's bet). Otherwise, if BPP has the option of passing or betting (the opponent did not previously bet in the current round), $\delta$ will be equal to 0. If BPP were to perform a bet or raise, a contribution to the pot of one betting unit must then also be made (the actual bet or raise) within the current round.

In future rounds, the projected contribution is calculated by assuming one bet is made for each remaining round up until the showdown, scaled by a factor, $\alpha '$. A different multiplying factor is used when modeling the future contribution made when betting as opposed to passing, and are optimised stochastically using a greed search in an attempt to maximise winnings (See Table 4). This allows us to represent the assumption that contributions made in future rounds when betting, will exceed those made when passing (betting trend flows throughout entire game) in an optimal way.

Hence, the formula for calculating BPP's expected future contribution to the final pot when betting is given by:


\begin{displaymath}\mathrm{b'} = \delta + B + (R \times B) \times \alpha'
\end{displaymath} (2)

If passing or calling, calculation of the projected future contribution is similar. The only difference being that no bet is placed in the current round and a different multiplying factor, $\alpha $, is used. When passing, the equation for BPP's future contribution to the pot is as follows.


\begin{displaymath}\mathrm{b} = \delta + (R \times B) \times \alpha
\end{displaymath} (3)

When calculating the expected final pot we not only need the expected future contributions made by BPP but to also calculate any contributions made by its opponent. This contribution to the final pot can be calculated in a similar fashion to that of BPP, the only difference being the size of contributions made in the current round. If BPP were to bet or raise, when calculating the opponent's future contribution to the final pot the value of $\delta$ will be equal to one betting unit (as they must call BPP's bet before proceeding).


\begin{displaymath}\mathrm{o'} = \delta + (R \times B) \times \alpha'
\end{displaymath} (4)

Where as, if BPP were to pass or call, the opponents calling contribution $\delta$ in the current round would be zero, as no bet was placed that requires being called.


\begin{displaymath}\mathrm{o} = (R \times B) \times \alpha
\end{displaymath} (5)

The expected final pot when BPP decides to bet F' can then be easily calculated by adding the projected future contributions by both players to the current pot. The final pot when passing F is calculated in a similar fashion but the appropriate expected contributions for each player are used.


F' = c + b' + o' (6)


F = c + b + o (7)

The Decision network then uses the belief in winning at a showdown and the utilities for each ``BPP Win'' / ``BPP Action'' pair to calculate the expected winnings (EW) for each possible betting action. Folding is always considered to have zero EW, since regardless of the probability of winning, as BPP cannot make any future loss or profit.

Two different models were constructed using the decision network for action selection. One model deterministically chose the action with the greatest EW, while the other randomised its selection of betting actions based on the distributions of EWs for all possible actions.



 
next up previous contents
Next: Deterministic Strategy Up: Betting Strategy Previous: Bluffing
Jason R Carlton
2000-11-13