next up previous contents
Next: Automated Texas Hold'em Up: Previous Computer Science Studies Previous: Machine learning

Koller and Pfeffer

Recently, Koller and Pfeffer Koller95,Koller97 have developed Gala, a system for automating game-theoretic analysis for two-player competitive imperfect information games. The system takes a description of a game, analyses it, and outputs strategies for the different players which are game-theoretically optimal for the game and situation described. The system represents games as a tree with the information sets (players' knowledge states) indicated. For games with imperfect information, the system finds an optimal randomised strategy. The system can now solve simplified versions of two-player poker (e.g. 3-card deck, 1 card dealt to each player, 3 rounds; or 11-card deck, 5 cards each player, 3 rounds). However, the authors state that:
``While we can now solve games with tens of thousands of nodes, we are nowhere close to being able to solve huge games such as full-scale poker, and it is unlikely that we will ever be able to do so. A game tree for five-card draw poker, for example, where players are allowed to exchange cards, has over 1025 different nodes. The situation (for zero-sum games) is now quite similar to that of perfect-information games: We have algorithms that are fairly efficient in the size of the game tree; unfortunately, the game tree is often extremely large.'' [Koller and PfefferKoller and Pfeffer1997]

Koller and Pfeffer's goal is to create an optimal poker player, a player that does the best that can be done against a perfect opponent, and it does not do worse even if its strategy is revealed to its opponent. However, the optimal strategy does not take advantage of and exploit mistakes by human players when they become apparent. A maximal player will shift from the optimal strategy to exploit the observed weak points of an opponent. In theory, a maximal player must take the risk of being sub-optimal to exploit a sub-optimal opponent. This risk is usually small and well rewarded. Since it does not seem feasible to determine an optimal player for real multi-player poker, a program to play real-world poker in the near future most likely will not be a game-theoretic optimal player.

Koller and Pfeffer have suggested that the Gala system may in future be able to take advantage of less-than-perfect players by learning the types of mistakes that a player is prone to make and incorporate that into the model. This approach can be used when there is a long-term interaction with the same player. The authors point out that the ability of the Gala language to capture regularities in the game may be particularly useful in this context, since the high-level description of a game state can provide features for the learning algorithm. One can see this learning algorithm as a potential opponent modeling component for a program based on the Gala system.


next up previous contents
Next: Automated Texas Hold'em Up: Previous Computer Science Studies Previous: Machine learning
Jason R Carlton
2000-11-13