The developers explained how the AI ​​beat no-limit hold'em poker professionals at a distance of 120,000 hands

Professional poker player Jason Les speaks with Professor Tuomas Sandholm from Carnegie Mellon University during a headzap with the bot Libratus. Jason lost almost one million conditional dollars to the program, more than any other professional

Recently, developers of weak AI systems often compare the effectiveness of their programs in a game confrontation against a person. That is, just in games. The computer has already defeated a man in checkers, chess and go. In these games with full information at any time of the game, all players have complete information about the state of the game, that is, about the position and all possible moves of any of the players.

In contrast to such deterministic situations, in games with incomplete information, some of the information about the state of the game is hidden from the player — for example, the opponent’s cards. No Limit Texas Hold'em is just one of these games. In addition to the closed cards of the opponent, an element of uncertainty is added here due to the arbitrary size of each bet. With this in mind, the number of possible outcomes is estimated at 10,161 .

Perhaps Texas Hold'em is the most popular game in the world with incomplete information. Billions of dollars are played online every day. The use of bots was strictly forbidden before, and now the owners of poker rooms have a new reason to monitor the processes on the player’s computer, since the Libratus program reliably picks up stacks on headdresses even from the best professionals.

The winning match of Libratus against four poker professionals was held January 11-30, 2017 in the framework of the competition “Brains vs. AI .

Stacks of the program Libratus and four opponents during 20 days of competition

The AI ​​played 120,000 hands in head-ups and, as a result, remained in positive territory for $ 1,766,250 conditional dollars. The players themselves were very impressed with the game of the program, which skillfully changed its strategy every day, adapting to the actions of the players.

Of course, the game was not for real money, so the players themselves were to some extent relaxed and not too responsible about the game than if they were playing with their own money. Yes, and they had to spend at the computer many hours of sessions every day, which is physically exhausting. Nevertheless, such a reliable payoff program can not fail to impress. Over 14 big blinds for a hundred hands. According to the developers' estimates, winning such a sum over such a long distance excludes the influence of luck with a probability of 99.7%, that is, it is a truly significant victory.

Now the developers of the program from Carnegie Mellon University have published a scientific article in which they explained the architecture and principles of teaching AI, which beat the poker professionals.

In short, to simplify the calculations, the program has grouped 10,161 possible outcomes by similar hands (for example, flush to the king and flush to the ladies) and similar bet sizes. Libratus consists of three modules. The first is a detailed, pre-compiled strategy for how to play in the first rounds (range of hands for raising from each position). Further, the strategy is spelled not so hard. The second strategy depends largely on the course of the game, that is, the fallen cards and the behavior of the opponent, taking into account his ranges and statistics. The third model is a strategy of the game especially against unpredictable opponents, that is, people. This third strategy is constantly being modified in real time. If a person took some unexpected maneuver for the program, then she saved it and entered it into her model, changing that with the light of new data and self-improvement.

According to the developers, successful work in situations with incomplete information gives the AI ​​an advantage not only in games. The fact is that such situations are ubiquitous in real life. Virtually all human life, almost all social and economic relations are “games” with incomplete information. Therefore, possession of the appropriate tools is extremely important for the successful survival of AI in the real world. In practice, such programs can be used, for example, to develop effective strategies in security systems, economic models, political models and other systems with incomplete information.

The techniques used in the Libratus program are largely independent of the scope of application, and therefore they can be used in programs of other purposes.

The scientific article was published on December 17 in the journal Science (doi: 10.1126 / science.aao1733, pdf ).


All Articles