[email protected] 05-11-2006 Center for the Study of Rationality Hebrew University of Jerusalem...

[email protected]

Center for the Study of RationalityHebrew University of Jerusalem

1/39

n-Player Stochastic Gameswith Additive Transitions

Frank Thuijsman

János Flesch & Koos Vrieze

Maastricht University

European Journal of Operational Research 179 (2007) 483–497

[email protected]


2/39

Outline

• Model

• Brief History of Stochastic Games

• Additive Transitions

• Examples

[email protected]


3/39

as = (a1s , a2

s , , ans) joint action

rs(as) = (r1s(as), r2

s(as), , rns(as)) rewards

ps(as) = (ps(1|as), ps(2|as), , ps(z|as)) transitions

• Infinite horizon• Complete Information• Perfect Recall• Independent and Simultaneous Choices

Finite Stochastic Game

1 s z

rs(as)

ps(as)

[email protected]


4/39

3-Player Stochastic Game

0, 0, 0

1, 3, 0 0, 3, 1 3, 0, 1

1

2 3 4

(1, 0, 0, 0)

(2/3, 1/3, 0, 0)

(2/3, 0, 1/3, 0)

(2/3, 0, 0, 1/3)

(1/3, 1/3, 0, 1/3)

(1/3, 0, 1/3, 1/3)

(1/3, 1/3, 1/3, 0)

(0, 1/3, 1/3, 1/3)

(0, 1, 0, 0) (0, 0, 1, 0) (0, 0, 0, 1)

0, 0, 0

0, 0, 0

0, 0, 0

0, 0, 0 0, 0, 0

0, 0, 00, 0, 0

T

L

F

R

B

N

1

2

3

[email protected]


5/39

general strategy i : N × S × H → X

i

(k, s, h) → X is

Markov strategy f i : N × S → X

i

(k, s) → X is

stationary strategy x i : S → X

i

(s) → X is

opponents’ strategy i , f

i and x i

Strategies

mixed actions

[email protected]


6/39

-Discounted rewards (with 0 < < 1)

is() = Es((1) k k1 Ri

k)

Limiting average rewards

is() = Es(limK → K1 K

k=1 Rik)

Rewards

[email protected]


7/39

MinMax Values

-Discounted minmax

v isinfi sup i i

s()

Limiting average minmax

v isinfi sup i i

s()

Highest rewards player i can defend

[email protected]


8/39

Equilibrium

= ( i)i N is an -equilibrium if

is(

i, i) ≤ i

s() +

for all i, for all i and for all s.

[email protected]


9/39

Question

(0, 0, 1, 0)(0, 1, 0, 0)

(0, 0, 0, 1)

(1, 0, 0, 0)(0, 0.5, 0.5, 0)

321

0, 0

0, 0

0, 0 3, 10, 0

Any -equilibrium?

(0, 0, 0, 1)

4

2,1

[email protected]


10/39

Highlights from (Finite)Stochastic Games History

Shapley, 1953

0-sum, “discounted”

Everett, 1957

0-sum, recursive, undiscounted

Gillette, 1957

0-sum, big match problem

[email protected]


11/39


Fink, 1964 & Takahashi, 1964

n-player, discounted

Blackwell & Ferguson, 1968

0-sum, big match solution

Liggett & Lippmann, 1969

0-sum, perfect inf., undiscounted

[email protected]


12/39


Kohlberg, 1974

0-sum, absorbing, undiscounted

Mertens & Neyman, 1981

0-sum, undiscounted

Sorin, 1986

2-player, Paris Match, undiscounted

[email protected]


13/39


Vrieze & Thuijsman, 1989

2-player, absorbing, undiscounted

Thuijsman & Raghavan, 1997

n-player, perfect inf., undiscounted

Flesch, Thuijsman, Vrieze, 1997

3-player, absorbing example, undiscounted

[email protected]


14/39


Solan, 1999

3-player, absorbing, undiscounted

Vieille, 2000

2-player, undiscounted

Solan & Vieille, 2001

n-player, quitting, undiscounted

[email protected]


15/39

Additive Transitions

ps(as) = ni=1 i

s pis(ai

s)

pis(ai

s) transition probabilities controlled by player i in state s

is transition power of player i in state s

0 ≤ is ≤ 1 and

i i

s = 1 for each s

[email protected]


16/39

Example for 2-PlayerAdditive Transitions

ps(as) = ni=1 i

s pis(ai

s)

(1, 0, 0)

(0.7, 0, 0.3) (0, 0.7, 0.3)

(0.3, 0.7, 0)

p11(2) = (0, 0, 1)

p11(1) = (1, 0, 0)

p21(1)=(1, 0, 0) p2

1(2)=(0, 1, 0)1

1 = 0.3

21 = 0.7

1

[email protected]


17/39

Results

1. 0-equilibria for n-player AT games (threats!)

2. 0-opt. stationary strat. for 0-sum AT games

3. Stat. -equilibria for 2-player abs. AT games

4. Result 3 can not be strengthened,

neither to 3-player abs. AT games,

nor to 2-player non-abs. AT games,

nor to give stat. 0-equilibria

[email protected]


18/39

The Essential Observation

Additive Transitions

induce

a Complete Ordering of the Actions

If ais is “better” than bi

s against some strategy,

Then ais is “better” than bi

s against any strategy.

[email protected]


19/39

“Better”

Consider strategies ais and bi

s for player i

Iffor some strategyais we have

t S ps(t | ais , ai

s ) vit ≥ t S ps(t | bi

s , ais ) vi

t

Then for all strategies bis we have

t S ps(t | ais , bi

s ) vit ≥ t S ps(t | bi

s , bis ) vi

t

[email protected]


22/39

“Best”

The “best” actions for player i in state s

are those that maximize the expression

t S ps(t | ais) vi

t

[email protected]


23/39

The Restricted Game

Let G be the original AT game and

let G* be the restricted AT game,

where each player is restricted

to his “best” actions.

[email protected]


24/39

The Restricted Game

Now v *i ≥ v i for each player i.

In G* : t S ps(t | a*s) v*it = v*i

s i, s, a*s

In G : t S ps(t | bis , a*i

s ) vit < vi

s i, s,

a*is , bi

s

[email protected]


25/39

The Restricted Game

If x*i yields at least v*i in G*,

then x*i yields at least v*i in G as well.

[email protected]


26/39

Ex. 1: 2-Player Absorbing AT Game

(0, 0, 1)(0, 1, 0)

(1, 0, 0) (0.5, 0.5, 0)

(0, 0.5, 0.5)(0.5, 0, 0.5)

321

0, 0

0, 0

0, 0

0, 0 1, 33, 1

(1, 0, 0)

(1, 0, 0)

(0, 1, 0)

(0, 0, 1)

0.5

0.5

[email protected]


27/39

(0, 0, 1)(0, 1, 0)

(1, 0, 0) (0.5, 0.5, 0)

(0, 0.5, 0.5)(0.5, 0, 0.5)

321

0, 0

0, 0

0, 0

0, 0 1, 33, 1

T, L B, LB, RT, R

0, 0 -1, 3-2, 2-3, 1

T, L

0, 0

NO stationary 0-equilibrium


[email protected]


28/39

(0, 0, 1)(0, 1, 0)

(1, 0, 0) (0.5, 0.5, 0)

(0, 0.5, 0.5)(0.5, 0, 0.5)

321

0, 0

0, 0

0, 0

0, 0

stationary -equilibrium

with >0

1

1

0

equilibrium rewards ≈ ((1

1, 33, 1


[email protected]


29/39

(0, 0, 1)(0, 1, 0)

(1, 0, 0) (0.5, 0.5, 0)

(0, 0.5, 0.5)(0.5, 0, 0.5)

321

0, 0

0, 0

0, 0

0, 0 1, 33, 1

non-stationary -equilibrium

Player 1: B, T, T, T, ….

Player 2: R, R, R, R, ….

equilibrium rewards ((223, 11, 3


[email protected]


30/39

Ex. 2: 2-Player Non-Absorbing AT Game

(0, 0, 1, 0)(0, 1, 0, 0)

(0, 0, 0, 1)

(1, 0, 0, 0)(0, 0.5, 0.5, 0)

321

0, 0

0, 0

0, 0 3, 10, 0

q > 0 p <

q 1 q

1p

p

q = 0 p > 1 q > 0

NO stationary -equilibrium

with >0

(0, 0, 0, 1)

4

2,1

[email protected]


31/39


Player 1: T, B, B, B, ….

Player 2: R, R, R, R, ….

(0, 0, 1, 0)(0, 1, 0, 0)

(0, 0, 0, 1)

(1, 0, 0, 0)(0, 0.5, 0.5, 0)

321

0, 0

0, 0

0, 0 3, 10, 0

(0, 0, 0, 1)

4

2,1


equilibrium rewards ((2, 1), (

[email protected]


32/39

alternative -equilibrium

Player 1: T, T, T, T, ….

Player 2: L, R, R, R, ….

(0, 0, 1, 0)(0, 1, 0, 0)

(0, 0, 0, 1)

(1, 0, 0, 0)(0, 0.5, 0.5, 0)

321

0, 0

0, 0

0, 0 3, 10, 0

(0, 0, 0, 1)

4

2,1


equilibrium rewards ((2, 1), (

[email protected]


33/39


0, 0, 0

1, 3, 0 0, 1, 3 3, 0, 1

1

2 3 4

(1, 0, 0, 0)

(2/3, 1/3, 0, 0)

(2/3, 0, 1/3, 0)

(2/3, 0, 0, 1/3)

(1/3, 1/3, 0, 1/3)

(1/3, 0, 1/3, 1/3)

(1/3, 1/3, 1/3, 0)

(0, 1/3, 1/3, 1/3)

(0, 1, 0, 0) (0, 0, 1, 0) (0, 0, 0, 1)

0, 0, 0

0, 0, 0

0, 0, 0

0, 0, 0 0, 0, 0

0, 0, 00, 0, 0

T

L

F

R

B

N

How to share 4 among three people

if only few solutions are allowed?

1

2

3

[email protected]


34/39

0, 0, 0

1, 3, 0

0, 1, 3

3, 0, 1

1, 3, 0 0, 1, 3 3, 0, 1

1

2 3 4

(1, 0, 0, 0)

(2/3, 1/3, 0, 0)

(2/3, 0, 1/3, 0)

(2/3, 0, 0, 1/3)

(1/3, 1/3, 0, 1/3)

(1/3, 0, 1/3, 1/3)

(1/3, 1/3, 1/3, 0)

(0, 1/3, 1/3, 1/3)

(0, 1, 0, 0) (0, 0, 1, 0) (0, 0, 0, 1)

1/2, 2, 3/2

2, 3/2, 1/2

3/2, 1/2, 2

4/3, 4/3, 4/3T

L

F

R

B

N


[email protected]


35/39

0, 0, 0

1, 3, 0

0, 1, 3

3, 0, 1

1/3 *

1/3 *

1/3 *

2/3 *

2/3 *

2/3 *

1 *1/2, 2, 3/2

2, 3/2, 1/2

3/2, 1/2, 2

4/3, 4/3, 4/3T

L

F

R

B

N

NO stationary -equilibrium


1, 3, 0

0, 1, 3

3, 0, 1

[email protected]


36/39

0, 0, 0

1, 3, 0

0, 1, 3

3, 0, 1

1/3 *

1/3 *

1/3 *

2/3 *

2/3 *

2/3 *

1 *1/2, 2, 3/2

2, 3/2, 1/2

3/2, 1/2, 2

4/3, 4/3, 4/3T

L

F

R

B

N

Player 1 on B: 1, ¾, 0, 0, 0, 0, 1, ¾, 0, 0, 0, 0, ….

Player 2 on R: 0, 0, 1, ¾, 0, 0, 0, 0, 1, ¾, 0, 0, ….

Player 3 on F: 0, 0, 0, 0, 1, ¾, 0, 0, 0, 0, 1, ¾, ….



[email protected]


37/39

0, 0, 0

1, 3, 0

0, 1, 3

3, 0, 1

1/3 *

1/3 *

1/3 *

2/3 *

2/3 *

2/3 *

1 *1/2, 2, 3/2

2, 3/2, 1/2

3/2, 1/2, 2

4/3, 4/3, 4/3T

L

F

R

B

N

equilibrium rewards (1, 2, 1)


Player 1 on B: 1, ¾, 0, 0, 0, 0, 1, ¾, 0, 0, 0, 0, ….

Player 2 on R: 0, 0, 1, ¾, 0, 0, 0, 0, 1, ¾, 0, 0, ….

Player 3 on F: 0, 0, 0, 0, 1, ¾, 0, 0, 0, 0, 1, ¾, ….

[email protected]


38/39

Results

1. 0-equilibria for n-player AT games (threats!)

2. 0-opt. stationary strat. for 0-sum AT games

3. Stat. -equilibria for 2-player abs. AT games

4. Result 3 can not be strengthened,

neither to 3-player abs. AT games,

nor to 2-player non-abs. AT games,

nor to give stat. 0-equilibria

[email protected]


39/39

[email protected]

?

[email protected]


40/39

GAME VER

[email protected]


41/39

GAME VER

[email protected]


42/39

GAME VER

[email protected]


43/39

GAME VER

[email protected] 05-11-2006 Center for the Study of Rationality Hebrew University of Jerusalem...

Documents

Transcript of [email protected] 05-11-2006 Center for the Study of Rationality Hebrew University of Jerusalem...