Humans can be very good at opponent modeling because they can make inferences based upon limited data to identify general trends or errors in an opponent's play. This ability relies on an opponent being predictable, but this is often the case against all but the best players. For example, you may observe an opponent showing a failed straight in the showdown twice in situations where they made calls with very poor pot odds. You may then infer that this opponent over-values straights until evidence contrary to this conclusion is presented.
For a computer, it is difficult to identify what parts of the context of an action are important, or how to make accurate inferences without large quantities of data. It is also difficult to make a computer identify trends outside of its mathematically understood value system (probabilistic measures of potential and strength), such as a particular opponent's over-optimistic evaluation of straights. Such a feature would require the computer to somehow learn how each opponent values the various features of particular hands. This is a complex problem, unless possible tendencies are anticipated so the computer can look for them.
Every time an opponent reveals cards in the showdown, a retroactive analysis of betting actions in that game is used for data. However, the majority of observations are not so informed due to one player folding during the game.
During the course of each hand, a conditional probability table is maintained for each opponent for each round of betting. For each possible current hands, the table gives the probability that the opponent would have performed a given betting action. This probability distribution is re-weighted after each opponent action, to be consistent with the betting actions performed throughout the course of play.
If we modify the probability matrices as play proceeds, but only maintain a single set of tables for all players, we have a form of generic opponent modeling which can be used to model ``typical'' play. This technique assumes that all opponents will play the same way in a particular situation and could be very inaccurate for certain opponents. This was the method of opponent modeling used by the earlier version of BPP. But even among very strong players, there is a wide variety of good styles, and handling each opponent appropriately is a basic requirement for an elite player. The best players are also excellent at adapting to the specific conditions, which may change rapidly over the course of a game.
Specific opponent modeling treats every player as distinct, and utilises information from all previous hands witnessed. A set of conditional probability tables are then maintained for each individual opponent and is used through play. Each new opponent begins with default matrices with reasonable probabilities learnt through self play experiments. As play proceeds, these probabilities are modified to reflect the individual opponents betting habits. BPP makes use of these conditional probability distributions within the Bayesian belief network. The probability matrices of the OPP Action nodes are initialised with the probabilities specific to the opponent. The opponent's action can then be observed and used as an indication of the strength of their current hand.
BPP adjusts these matrices over time, using the relative frequencies of their actions. Since the rules of poker do not allow the observation of hidden cards unless the hand is held to showdown, these counts are made only for such hands. Due to the small number of games which a decided at a showdown (0.17) this is likely to introduce some selection bias into the estimates of conditional probabilities, but the nature of this bias has yet to be determined.