Concurrent Reachability Games

67
Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1 CTW 2009

description

Concurrent Reachability Games. Peter Bro Miltersen Aarhus University. My apologies …. For not getting slides ready in time for inclusion in booklet ! Slides available at http://www.daimi.au.dk/~bromille. Concurrent reachability games. - PowerPoint PPT Presentation

Transcript of Concurrent Reachability Games

CTW 2009 1

Concurrent Reachability Games

Peter Bro MiltersenAarhus University

CTW 2009 2

My apologies…

• For not getting slides ready in time for inclusion in booklet!

• Slides available at http://www.daimi.au.dk/~bromille

CTW 2009 3

Concurrent reachability games

• Class of two-player zero-sum games generalizing simple stochastic games (Uri’s talk yesterday).

• Studied mainly by the formal methods (”Eurotheory”) community (but sometimes at such venues as FOCS and SODA).

• Very interesting and challenging algorithmic problems!

Simple Stochastic game (SSGs) Reachability version [Condon (1992)]

Objective: MAX/min the probability of getting to the MAX-sink

Two Players: MAX and min

MAX minRAND

R

MAX-sink

min-sink

Slide stolen from Uri…..

1/2

1/2ZP’96

Simple Stochastic games (SSGs)Strategies

A general strategy may be randomized and history dependent

A positional strategy is deterministicand history independent

Positional strategy for MAX: choice of an outgoing edge from each MAX vertex

Another slide stolen from Uri…..

Simple Stochastic games (SSGs)Values

Both players have positional optimal strategies

Every vertex i in the game has a value vi

positional general

positional general

There are strategies that are optimal for every starting position

Last slide stolen from Uri (I promise!)

Simple Stochastic game (SSGs) Reachability version [Condon (1992)]

Objective: MAX/min the probability of getting to the MAX-sink

Two Players: MAX and min

MAX minRAND

R

MAX-sink

min-sink

1/2

1/2ZP’96

Concurrent Reachability Games

CTW 2009 8

(Simple) concurrent reachability game

• Arena:– Finite directed graph. – One Max sink (”goal”) node.– Each non-sink node has assigned a 2x2 matrix of outgoing arcs.

• Play:– A pebble moves from node to node as in a simple stochastic

game.– In each step, Max chooses a row and Min simultaneously chooses

a column of the matrix. – The pebble moves along the appropriate arc.– If Max reaches the goal node he wins– If this never happens, Min wins.

CTW 2009 9

Simulation

MAX

CTW 2009 10

Simulation

min

CTW 2009 11

Simulation

R1/2

1/2

…. Somewhat more subtle that this works!

CTW 2009 12

”Proof” of correctness

• We want values in the CRG to be the same as in the SSG.

• In particular, the value of the node simulating a coin toss should be the average of the values of the two nodes it points to.

• If these two values are the same, this is ”clearly” the case.

• If they have different values v1, v2, the simulated coin toss nodes is a game of Matching Pennies with payoffs v1, v2. This game has value (v1+v2)/2.

Simple Stochastic games (SSGs)Values

Both players have positional optimal strategies

Every vertex i in the game has a value vi

positional general

positional general

There are strategies that are optimal for every starting position

Concurrent Reachability Games (CRGs)

Simple Stochastic games (SSGs)Values

Both players have stationary optimal strategies

Every vertex i in the game has a value vi

stationary general

stationary general

There are strategies that are optimal for every starting position

sup

inf

Concurrent Reachability Games (CRGs)

Stationary: As positional, except that we allow randomization

CTW 2009

Why randomized strategies?

15

MAX-sink

min-sink0-1 matrix games can

be immediately siimulated

CTW 2009

Why sup/inf instead of max/min?

16

MAX-sink

min-sink

CTW 2009

Why sup/inf instead of max/min?

17

MAX-sink

min-sink

CTW 2009 18

Why sup/inf instead of max/min

• ”Conditionally repeated matching pennies”: – Min hides a penny– Max tries to guess if it is heads up or tails up.– If Max guesses correctly, he gets the penny.– If Max incorrectly guesses tails, he loses (goes into

min-sink/trap)– If Max incorrectly guesses heads, the game

repeats.• What is the value of this game? 1

CTW 2009 19

Almost optimal strategy for Max

• Guess ”heads” with probability 1-² and ”tails” with probability ² (every time).

• Guaranteed to win with probability 1-².

• But no strategy of Max wins with probability 1.

Values and near-optimal strategies

• Each position in a concurrent reachability game has a value.

• For any ε>0, each player has a stationary strategy guaranteeing the value within ε (an ε-optimal strategy).

• Shown in Everett, “Recursive games”, 1953.

CTW 2009 21

Algorithmic problems

• Qualitatively solving a CRG.– Determining which nodes have value 1.

• Quantitatively solving a CRG.– Approximately computing the values of the nodes.

• Strategically solving a CRG.– Computing an ²-optimal stationary strategy for a

given ².

CTW 2009 22

Qualitatively solving CRGs

• De Alfaro, Henzinger, Kupferman, FOCS 1998.– Beautiful algorithm!– Formal methods community type algorithm!– Fixed point computation inside a fixed point

computation inside a fixed point computation….– Runs in time O(n2).

• Open (I think): Can this time bound be improved? (for SSGs the corresponding time is linear)

CTW 2009 23

Quantitatively solving CRGs

• We want to approximate the values of the positions.

• Why not compute them exactly?

CTW 2009 24

The value of a CRG may be irrational!Ferguson, Game Theory

Positive payoffs different from 1 can be simulated with scaling and coin toss gadgets.Negative payoffs are harder to simulate but in this game we can do it by adding a constant to all payoffs

CTW 2009 25

Quantitatively solving CRGs

• We want to approximate the values of the positions.

• Why not compute them exactly?• Maybe we want to look at the decision

problem consisting of comparing the value to a given rational?

CTW 2009 26

SUM-OF-SQRT hardness

• SUM-OF-SQRT: Given an epression E which is a weigthed (by integers) sum of square roots (of integers), does E evaluate to a positive number?

• Not known to be in P or NP or even the polynomial hierarchy (open at least since Garey and Johnson).

• Etessami and Yannakakis, 2005: Comparing the value of a CRG to a rational number is hard for SUM-OF-SQRT.

CTW 2009 27

Sketch of Proof

• We already saw how to make games whose values are the solution to certain quadratic equations, i.e., square roots + rationals.

• Once we have a bunch of such games, we can easily make a game whose value is the average by a ”coin toss gadget”.

CTW 2009 28

Quantitatively solving CRGs

• We want to approximate the values of the positions.

• Why not compute them exactly?• Maybe we want to compare the value to a

given rational?• Given ², we want to compute an

approximation within ².

CTW 2009 29

Value iteration

• Assign all nodes ”value approximation” 0• Replace pointers with value approximations. Each node is

now a matrix game.• Solve and replace approximations.• Theorem: Value approximations converge to values (from

below).• Proof sketch: The value approximations are the exact values of

a time limited version of the game.• How long time to get witin 0.01 of actual values?• Even for SSGs this takes exponential time (Condon’93).• For CRGs, an open problem until recently (see later).

CTW 2009 30

Another algorithm for approximating values

• The property of being a number larger or smaller than the value of a CRG can be expressed by a polynomial length formula in the existential first order theory of the reals.

• There exists a stationary strategy such that…. • As a corollary to Renegar’89, approximating the value

is in PSPACE.• This is the best known ”complexity class” upper bound!• …. also the best known concrete ”big-O” complexity

bound (using Basu et al instead of Renegar).

CTW 2009 31

Why no NP Å coNP upper bound?

• Guess a strategy and verify that it works?

• Chatterjee, Majumdar, Jurdzinski, On Nash equilibria in stochastic games, CSL’04 claims such a result.

• In 2007, Kousha Etessami found a technical issue in the proof and the authors retracted the claim.

CTW 2009 32

• It is not obvious that computing the values gives any information about the strategies.

• In contrast, for SSGs, optimal strategies can be computed from values in linear time (Andersson and M., ISAAC’09)

Computing values vs. Finding strategies

MAX-sink

Algorithms strategically solving concurrent reachability games

Chatterjee, de Alfaro, Henzinger. Strategy improvement for concurrent reachability games. QEST’06.

Chatterjee, de Alfaro, Henzinger. Termination criteria for solving concurrent safety and reachability games, SODA’09.

Policy improvement!No time bounds given….

Theorem [Hansen, Koucky and M., LICS’09]:– Any algorithm that manipulates ε-optimal

strategies of concurrent reachability games must use exponential space (so no NP Å coNP algorithm comes from guessing strategies)

– Value iteration requires worst case doubly exponential time to come within non-trivial distance of actual values (in contrast, value iteration on SSGs converges in only exponential time).

“Hardness” of solving CRGs

Dante in Purgatory

1

2

3

4

5

6

7

Dante enters Purgatory at terrace 1.

Purgatory has 7 terraces.

Dante in Purgatory

1

2

3

4

5

6

7

While in Purgatory, once asecond, Dante must playMatching Pennieswith Lucifer

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Dante in Purgatory

1

2

3

4

5

6

7 If Dante wins Matching Penniesat terrace 7, he wins the game of Purgatory.

Dante in Purgatory

1

2

3

4

5

6

7 If Dante wins Matching Penniesat terrace 7, he wins the game of Purgatory.

Dante in Purgatory

1

2

3

4

5

6

7

If Dante loses Matching Penniesguessing Heads, he goes back toterrace 1.

Dante in Purgatory

1

2

3

4

5

6

7

If Dante loses Matching Penniesguessing Heads, he goes back toterrace 1.

Dante in Purgatory

1

2

3

4

5

6

7

If Dante loses Matching Penniesguessing Heads, he goes back toterrace 1.

Dante in Purgatory

1

2

3

4

5

6

7

If Dante loses Matching Penniesguessing Taiis…..

…. he loses the game of Purgatory!!!!

Dante in Purgatory

• Is there is a strategy for Dante so that he is guaranteed to win the game of Purgatory with probability at least 90%?– Yes.

• How long can Lucifer confine Dante to Purgatory if Dante plays by such a strategy?– 1055 years.

Dante in Purgatory

A bit surprising – when Dante wins, he has guessedcorrectly which hand seven times in a row!

Apply algorithm of de Alfaro, Henzinger and Kupferman

Purgatory is a game of doubly exponential patience.

• The patience of a mixed strategy is 1/p where p is the smallest non-zero probability used by the strategy (Everett, 1957).

• To win with probability 1-ε, Dante must choose “Heads” at terrace i with probability greater than (approximately)

1- ε27-i

• On the other hand, choosing “Heads” with probability 1 is no good!

• To win with probability 9/10, he must choose “Heads” at terrace 1 with probability greater than 1-(1/10)64 = 0.9999999999999999999999999999999999999999999999999999999999999999. But then Lucifer can respond by always choosing “Tails” at terrace 1.

Theorem [Hansen, Koucky and M.]:–Any algorithm that manipulates ε-optimal

strategies of concurrent reachability games must use exponential space.

• Proof: Storing 0.9999999999999999999999999999999999999999999999999999999999999999 takes up a lot of space!

“Hardness” of solving CRGs

Time of play and value iteration• To win Purgatory with probability 1-², almost all probability mass

has to be assigned to strategies leading to plays of length at least (1/²)2n-1.

• On the other hand, (1/²)2116n is worst possible expected time of play for any game with n nodes.

• Corollary: To solve Purgatory quantitatively using value iteration, (1/²)2n-1 iterations are needed to get anywhere near the correct values. But (1/ε)2116 n iterations is enough to get ε-close for any n-node game.

• Upper bounds shown (again )by appealing to the first order theory of the reals (semi-algebraic geometry), in particular Basu et al.

Patience of Purgatory with n terraces and ² < ½

• Upper bound: (1/²)2n-1

• Lower bound: ((1-²)/²2)2n-2

Proof of lower bound

±

> ±2

WLOG first placefrom abovewhere thishappens…

Proof of lower bound

Open problems

• What is the exact patience of Purgatory? Probably not a closed expression.• Is Purgatory extremal with respect to

patience among n-node CRGs? • If yes, this gives a better upper bound on

number of iterations of value iteration for CRGs, replacing 116 with 1!

Compare

Condon’s example. Extremal with respect to, e.g., expected absorption time.

Open Problem• The fact that the values can be approximated in PSPACE, stronlgy

suggests that PSPACE should be enough for “understanding” CRGs.

• Is there a “natural” representation of probabilities so that– ε-optimal strategies of CRGs can be represented succinctly and– ε-optimal strategies of CRGs can be computed using polynomial space?

• De Alfaro, Henzinger, Kupferman , FOCS’98: Yes, for the restricted case CRGs where the values of all positions are 0 or 1.

• CRGs seem much harder to analyze than SSGs. Are there any formal argument sfor this (beyond SUM-OF-SQRT hardness)?

CTW 2009 67

Thank you!