10 thoughts on “Poker Science: What can we Learn from Poker Artificial Intelligence?

  1. U of Alberta had a really good run with other games, too. They had the 8×8 checkers world champion program and later made big strides with chess.

    About Polk et al vs Claudico, from Wikipedia: "… Each day featured two 750-hand matches over eight hours (plus breaks) against each of the humans, for a total of 20,000 hands per player over the course of 13 days (with one rest day in the middle).[1][2] For each 750-hand set, the same hands were dealt to one human taking on Claudico on the main casino floor and another battling the computer in an isolation room, with the hole cards reversed.[4] This was done to ensure that card luck was not a factor in the outcome. The 80,000-hand sample represented the largest-ever human–computer data set. Claudico was able to adjust to its opponent's strategy as the matches progressed, just as the humans could.[2] The match winner was determined by the overall chip count after 80,000 hands; although individual results were kept for the four pros, they were competing as a single team. If the final chip count had been too close for the difference to be statistically meaningful, the match would be declared a draw.[4]

    … The blinds were 50 and 100 chips for every hand, and both the human's and computer's chip stack were reset to 20,000 at the beginning of each hand. Halfway through the match, the human team was ahead 458,000 chips versus Claudico.[6] The humans went on to increase their lead, winning the match by 732,713 chips. Polk finished up 213,000, Li won 529,000, Kim beat Claudico by 70,000, and Les finished down 80,000. …"

    So in 80K hands the humans were up 36.6 BI. Is that a lot for HU? I don't play HU so am asking. Edit: or humans up about one BI for every 2200 hands.

  2. You say that now but wait til the Poker AI decides your cells would be more efficient to be repurposed

  3. Hey all,
    Wanted to jump in and clarify the numbers here and some poorly-chosen language on my part. Kudos to everyone who noticed that this is unclear.

    The human win vs Claudico really was an average of roughly $9/hand. (In the Poker AI world, the preferred metric is "milli-big blinds perhand," or "mbb", i.e. the number of big blinds won per thousand hands, which in this case was 91).

    That's definitely a clear win– the typical threshold for a solid heads-up victory is considered 50mbb, which would be $5/hand in this case. I should have been much clearer when I called the human victory "slim." $9/hand isn't "slim" on its own, but the problem is determining if that $9 figure is an accurate representation of the skill levels.

    So the victory was "slim" in the sense that when you crunch the numbers, it is not necessarily statistically significant. In other words, there was so much variation between the players and over the course of the competition that it's not clear whether $9/hand represented the "real" human edge, or if it just happened to be where things were at the end of the competition. You might think that 80k hands would be more than enough to establish a clear pattern, but when you work out the statistics, it turns out that there was so much variance overall that it is still hard to separate the luck from the skill. This is part of the reason that in the rematch, they extended the gameplay to 120,000 hands.

    Thanks everyone for watching and for caring about the details like these!

  4. I like the video.

    Technology like a HUD, can improve your game, and new technology should also be able to improve your game. I like that.

    However, the HUGE discussion, is about the websites.

  5. "Because it was probably learning the player's weaknesses". This is not what Libratus does. Libratus was build to calculate close to a perfect balanced game or in other words it tried to find the Nash equilibrium. However, finding the perfect Nash equilibrium in a no-limit poker game is impossible yet for today's computers. Too many calculations are required. Therefore it tries to find a rough Nash equilibrium by use of abstractions (e.g. 301 = 300 chips, or AdKc = AhKc) and by doing some other type of calculations on many samples of hands. So the built strategy still had some holes in it.
    It performed lots of calculations prior to the contest against the pros. Then when playing against the pros, after each day the supercomputer searched for the holes in its game for which it had lost most of its money and those holes were filled up by new calculations during the night. Although they used a supercomputer it could only fill up around 3 holes each night… So what happened is that the pros found leaks in the AI's game, but the AI was able to find these leaks and filled it up by balancing its strategy on that area of the game tree. For example: On day 3 the pros found out that when they 3-betted on certain turns very big, the AI folded not often enough or too often… In these situations the pros won money, but on the next day that strategy did not work anymore. And on each day the pros tried something new, but slowly the AI grew better at balancing in all those situations. And on the last days the pros could not find any leaks in it anymore and then the AI got a strong edge on them.

    Hence, It does not calculate certain types of strategies from its opponent and to exploit that. It's just balanced. The AI can play against every opponent it wants. The less balanced the opponent, the better it plays against. No information from something like a HUD is used. A HUD is perfect to gain knowledge of exploitable play from opponents and you can use that to use a counter strategy against them.

Comments are closed.