How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to...

44
How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group School of Computer Science University of Manchester 1 / 44

Transcript of How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to...

Page 1: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

How to Win Texas Hold’em Poker

Richard Mealing

Machine Learning and Optimisation GroupSchool of Computer Science

University of Manchester

1 / 44

Page 2: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

How to Play Texas Hold’em Poker

1 Deal 2 private cards per player

2 1st (sequential) betting round

3 Deal 3 shared cards (“flop”)

4 2nd betting round

5 Deal 1 shared card (“turn”)

6 3rd betting round

7 Deal 1 shared card (“river”)

8 4th (final) betting round

If all but 1 player folds, that player wins the pot (total bet)

Otherwise at the end of the game hands are compared (“showdown”)and the player with the best hand wins the pot

2 / 44

Page 3: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

How to Play Texas Hold’em Poker

3 / 44

Page 4: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

How to Play Texas Hold’em Poker

Ante = forced bet (everyone pays)

Blinds = forced bets (2 people pay big/small)

If players > 2 then (big blind player, small blind player, dealer)If players = 2 (“heads-up”) then (big blind, small blind/dealer)

No-Limit Texas Hold’em lets you bet all your money in a round

Minimum bet = big blindMaximum bet = all your money

Limit Texas Hold’em Poker has fixed betting limits

A $4/$8 game means in betting rounds 1 & 2 bets = $4 and in bettingrounds 3 & 4 bets = $8Big blind usually equals “small” bet e.g. $4 and small blind is usually50% of big blind e.g. $2Total number of raises per betting round is usually capped at 4 or 5

4 / 44

Page 5: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

1-Card Poker Trees

1 Game tree - both players’ private cards are known

5 / 44

Page 6: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

1-Card Poker Trees

1 Public tree - both players’ private cards are hidden

6 / 44

Page 7: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

1-Card Poker Trees

1 P1 information set tree - P2’s private card is hidden

7 / 44

Page 8: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

1-Card Poker Trees

1 P2 information set tree - P1’s private card is hidden

8 / 44

Page 9: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

1-Card Poker Trees

1 Game tree - both players’ private cards are known

2 Public tree - both players’ private cards are hidden

3 P1 information set tree - P2’s private card is hidden

4 P2 information set tree - P1’s private card is hidden

9 / 44

Page 10: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Heads-Up Limit Texas Hold’em Poker Tree Size

F

F

©

C

F

©

C

F

©

C

F

©

C

F

©

C

R

R

R

R

C

F

©

C

F

©

C

F

©

C

F

©

C

R

R

R

R

Cards Dealt

P1 dealt 2 private cards =(52

2

)= 1326

P2 dealt 2 private cards =(50

2

)= 1225

1st betting round = 29, 9 continuing

Flop dealt =(48

3

)= 17296

2nd betting round = 29, 9 continuing

Turn dealt = 45

3rd betting round = 29, 9 continuing

River dealt = 44

4th betting round = 29

10 / 44

Page 11: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Heads-Up Limit Texas Hold’em Poker Tree Size

Player 1 Deal = 1

Player 2 Deal = 1326

1st Betting Round = 1326 * 1225 * 29

2nd Betting Round = 1326 * 1225 * 9 * 17296 * 29

3rd Betting Round = 1326 * 1225 * 9 * 17296 * 9 * 45 * 29

4th Betting Round = 1326 * 1225 * 9 * 17296 * 9 * 45 * 9 * 44 * 29

Total = 1.179× 1018 (quintillion)

11 / 44

Page 12: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Abstraction

Lossless

Suit isomorphism, at the start (pre-flop) two hands are strategically thesame if each of their cards’ ranks match and they are both “suited” or“off-suit” e.g. (A♠K♠, A♣K♣) or (T♣J♠, T♦J♥), 169 equivalenceclasses reduces possible starting hands from 1624350 to 28561

Lossy

Bucketing (binning) groups hands into equivalence classes e.g. basedon their probability of winning at showdown against a random handImperfect recall eliminates past informationBetting round reductionBetting round elimination

12 / 44

Page 13: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Abstraction

Heads-up Limit Texas Hold’em poker has around 1018 states

Abstraction can reduce the game to e.g. 107 states

Nesterov’s excessive gap technique can find approximate Nashequilibria in a game with 1010 states

Counterfactual regret minimization can find approximate Nashequilibria in a game with 1012 states

13 / 44

Page 14: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Nash Equilibrium

Game theoretic solution

Set of strategies 1 per player such that no one can do better bychanging their strategy if the others keep their strategies fixed

Nash proved that in every game with finite players and pure strategiesthere is at least 1 (possibly mixed) Nash equilibrium

14 / 44

Page 15: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Annual Computer Poker Competition 2012Heads-up Limit Texas Hold’em

Total Bankroll:1 Slumbot (Eric Jackson, USA)

2 Little Rock (Rod Byrnes, Australia) and Zbot (Ilkka Rajala, Finland)

Bankroll Instant Run-off:1 Slumbot (Eric Jackson, USA)

2 Hyperborean (University of Alberta, Canada)

3 Zbot (Ilkka Rajala, Finland)

Heads-up No-Limit Texas Hold’emTotal Bankroll:

1 Little Rock (Rod Byrnes, Australia)

2 Hyperborean (University of Alberta, Canada)

3 Tartanian5 (Carnegie Mellon University, USA)

Bankroll Instant Run-off:1 Hyperborean (University of Alberta, Canada)

2 Tartanian5 (Carnegie Mellon University, USA)

3 Neo Poker Bot (Alexander Lee, Spain)

3-player Limit Texas Hold’emTotal Bankroll:

1 Hyperborean (University of Alberta, Canada)

2 Little Rock (Rod Byrnes, Australia)

3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University ofAuckland, New Zealand)

Bankroll Instant Run-off:1 Hyperborean (University of Alberta, Canada)

2 Little Rock (Rod Byrnes, Australia)

3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University ofAuckland, New Zealand)

Source: http://www.computerpokercompetition.org/index.php/competitions/results/90-2012-results

15 / 44

Page 16: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Annual Computer Poker Competition

Total Bankroll = total money won against all agents

Bankroll Instant Run-off1 Set S = all agents2 Set N = agents in a game3 Play every

(|S||N|)

possible matches between agents in S storing each

agent’s total bankroll4 Remove the agent(s) with the lowest total bankroll from S5 Repeat steps 2 and 3 until S only contains |N| agents6 Play a match between the last |N| agents and rank them according to

their total bankroll in this game

16 / 44

Page 17: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

A finite set of players N = {1, 2, ..., |N|} ∪ {c}A finite set of action sequences or histories e.g.H = {(), ..., (A♥A♠), ...}Z ⊆ H terminal histories e.g. Z = {..., (A♥A♠, 2♦7♣, r ,F ), ...}A(h) = {a : (h, a) ∈ H} actions available after history h ∈ H\ZP(h) ∈ N ∪ {c} player who takes an action after history h ∈ H\Zui : Z → R utility function for player i

17 / 44

Page 18: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

fc maps every history h where P(h) = c to an independent probabilitydistribution fc(a|h) for all a ∈ A(h)

Ii is an information partition (set of nonempty subsets of X whereeach element of X is in 1 subset) for player i

Ij ∈ Ii is player i ’s jth information set containing indistinguishablehistories e.g. Ij = {..., (A♥A♠, 2♦7♣), ..., (A♥A♠, 6♣3♠), ...}Player i ’s strategy σi is a function that assigns a distribution overA(Ij) for all Ij ∈ Ii where A(Ij) = A(h) for any h ∈ Ij

A strategy profile σ is a strategy for each player σ = {σ1, σ2, ..., σ|N|}

18 / 44

Page 19: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Nash Equilibrium

Nash Equilibrium:u1(σ) ≥ maxσ′

1∈Σ1u1(σ′1, σ2)

u2(σ) ≥ maxσ′2∈Σ2

u2(σ1, σ′2)

ε-Nash Equilibrium:u1(σ) + ε ≥ maxσ′

1∈Σ1u1(σ′1, σ2)

u2(σ) + ε ≥ maxσ′2∈Σ2

u2(σ1, σ′2)

19 / 44

Page 20: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

20 / 44

Page 21: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

21 / 44

Page 22: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

22 / 44

Page 23: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

23 / 44

Page 24: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

24 / 44

Page 25: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

25 / 44

Page 26: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

26 / 44

Page 27: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

27 / 44

Page 28: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

28 / 44

Page 29: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

I1 = {I1, I2, I7, I8} and I2 = {I3, I4, I5, I6}A((J, J)) = {C ,R} and P((J, J)) = 1

29 / 44

Page 30: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4

30 / 44

Page 31: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4

31 / 44

Page 32: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Extensive-Form Game

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

fc(J|(J)) = 0.5 and fc(K |(J)) = 0.5σ1(I1,C ) = 0.6 and σ1(I1,R) = 0.4

32 / 44

Page 33: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

Counterfactual regret minimization minimizes the maximumcounterfactual regret (over all actions) at every information set

Minimizing counterfactual regrets minimizes overall regret

In a two-player zero-sum game at time T , if both players’ averageoverall regret is less than ε, then σ̄T is a 2ε Nash equilibrium.

33 / 44

Page 34: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

Counterfactual Value

vi (Ij |σ) =∑n∈Ij

πσ−i (root, n)ui (n)

ui (n) =∑

z∈Z [n]

πσ(n, z)ui (z)

vi (Ij |σ) is the counterfactual value to player i of information set Ijgiven strategy profile σ

πσ−i (root, n) is the probability of reaching node n from the rootignoring player i ’s contributions according to strategy profile σ

πσ(n, z) is the probability of reaching node z from node n accordingto strategy profile σ

ui (n) is the payoff to player i at node n if it is a leaf node or itsexpected payoff if it is a non-leaf node

Z [n] is the set of terminal nodes that can be reached from node n

34 / 44

Page 35: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

0.0F

2

1.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

0.0F

0

1.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

v1(I8|σ) =∑n∈I8

πσ−i (root, n)u1(n)

= 0.5 ∗ 0.5 ∗ 0.2 ∗ (0.0 ∗ −1 + 1.0 ∗ 2) +

0.5 ∗ 0.5 ∗ 0.9 ∗ (0.0 ∗ −1 + 1.0 ∗ 0)

= 0.1

35 / 44

Page 36: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

Counterfactual Regret

r(Ij , a) = vi (Ij |σIj→a)− vi (Ij |σ)

r(Ij , a) is the counterfactual regret of not playing action a atinformation set Ij

Positive regret means the player would have preferred to play action arather than their strategy

Zero regret means the player was indifferent between their strategyand action a

Negative regret means the player preferred their strategy rather thanplaying action a

36 / 44

Page 37: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

I1

I3

0

0.8C

I7

-1

1.0F

0

0.0C

0.2R

0.6C

I4

1

1.0F

0

0.0C

0.4R

0.5J

I1

I5

-1

0.1C

I7

-1

1.0F

-2

0.0C

0.9R

0.6C

I6

1

0.0F

-2

1.0C

0.4R

0.5K

0.5J

I2

I3

1

0.8C

I8

-1

1.0F

2

0.0C

0.2R

0.3C

I4

1

1.0F

2

0.0C

0.7R

0.5J

I2

I5

0

0.1C

I8

-1

1.0F

0

0.0C

0.9R

0.3C

I6

1

0.0F

0

1.0C

0.7R

0.5K

0.5K

v1(I8|σI8→F ) = 0.5 ∗ 0.5 ∗ 0.2 ∗ (1.0 ∗ −1 + 0.0 ∗ 2) +

0.5 ∗ 0.5 ∗ 0.9 ∗ (1.0 ∗ −1 + 0.0 ∗ 0)

= −0.275

r1(I8|F ) = v1(I8|σI8→F )− v1(I8|σ) = −0.275− 0.1 = −0.375

37 / 44

Page 38: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

Cumulative Counterfactual Regret

RT (Ij , a) =T∑t=1

r t(Ij , a)

RT (Ij , a) is the cumulative counterfactual regret of not playing actiona at information set Ij for T time steps

Positive cumulative regret means the player would have preferred toplay action a rather than their strategy over those T steps

Zero cumulative regret means the player was indifferent between theirstrategy and action a over those T steps

Negative cumulative regret means the player preferred their strategyrather than playing action a over those T steps

38 / 44

Page 39: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

Regret Matching

σT+1(Ij , a) =

RT ,+(Ij ,a)∑

a′∈A(Ij ) RT ,+(Ij ,a′)

if denominator is positive

1

|A(Ij )| otherwise

RT ,+(Ij , a) = max(RT (Ij , a), 0)

39 / 44

Page 40: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

1 Initialise the strategy profile σ e.g. for all i ∈ N, for all Ij ∈ Ii and forall a ∈ A(Ij) set σ(Ij , a) = 1

|A(Ij )|2 For each player i ∈ N, for all Ij ∈ Ii and for all a ∈ A(Ij) calculate

r(Ij , a) and add it to R(Ij , a)

3 For each player i ∈ N, for all Ij ∈ Ii and for all a ∈ A(Ij) use regretmatching to update σ(Ij , a)

4 Repeat from 2

40 / 44

Page 41: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

Cumulative counterfactual regret is bounded by

RTi (Ij) ≤

(maxz ui (z)−minz ui (z))√|A(Ij)|√

T

Total counterfactual regret is bounded by

RTi ≤

|Ii |(maxz ui (z)−minz ui (z))√

maxh:P(h)=i |A(h)|√T

41 / 44

Page 42: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Counterfactual Regret Minimization

(a) Number of game states, number of iterations, computation time, andexploitability of the resulting strategy for different sized abstractions

(b) Convergence rates for three different sized abstractions, x-axis showsiterations divided by the number of information sets in the abstraction

Source: 2008 - “Regret Minimization in Games with Incomplete Information” - Zinkevich et al

42 / 44

Page 43: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

Summary

If you want to win (in expectation) at Texas Hold’em poker (againstexploitable players) then. . .

1 Abstract the version of Texas Hold’em poker you are interested so ithas at most 1012 game states

2 Run the counterfactual minimization algorithm on the abstraction forT iterations and obtain the average strategy profile σ̄Tabs

3 Map the average strategy profile σ̄Tabs for the abstracted game to oneσ̄T for the real game

4 Play your average strategy profile σ̄T against your (exploitable)opponents

43 / 44

Page 44: How to Win Texas Hold'em Poker - School of Computer ...mealingr/documents/How to Win... · How to Win Texas Hold’em Poker Richard Mealing Machine Learning and Optimisation Group

References

1 Annual Computer Poker Competition Websitehttp://www.computerpokercompetition.org/

2 2008 - “Regret Minimization in Games with Incomplete Information” -Zinkevich et al -http://martin.zinkevich.org/publications/regretpoker.pdf

3 2007 - “Robust strategies and counter-strategies Building a champion levelcomputer poker player” - Johanson -http://poker.cs.ualberta.ca/publications/johanson.msc.pdf

4 2013 - “Monte Carlo Sampling and Regret Minimization for EquilibriumComputation and Decision-Making in Large Extensive Form Games” -Lanctot http://era.library.ualberta.ca/public/view/item/uuid:482ae86c-2045-4c12-b91c-3e7ce09bc9ae

44 / 44