Uri Zwick Tel Aviv University
description
Transcript of Uri Zwick Tel Aviv University
![Page 1: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/1.jpg)
Uri ZwickTel Aviv University
Simple Stochastic GamesMean Payoff Games
Parity Games
CSR 2008Moscow, Russia
![Page 2: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/2.jpg)
Mean Payoff Games
Simple Stochastic Games
Parity Games
Randomized subexponential
algorithm for SSG
Deterministic subexponential
algorithm for PG
![Page 3: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/3.jpg)
Mean Payoff Games
Simple Stochastic Games
Parity Games
![Page 4: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/4.jpg)
R
R
R
R
A simple Simple Stochastic Game
![Page 5: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/5.jpg)
Simple Stochastic game (SSGs) Reachability version [Condon (1992)]
Objective: MAX/min the probability of getting to the MAX-sink
Two Players: MAX and min
MAX minRAND
R
MAX-sink
min-sink
![Page 6: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/6.jpg)
Simple Stochastic games (SSGs)Strategies
A general strategy may be randomized and history dependent
A positional strategy is deterministicand history independent
Positional strategy for MAX: choice of an outgoing edge from each MAX vertex
![Page 7: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/7.jpg)
Simple Stochastic games (SSGs)Values
Both players have positional optimal strategies
Every vertex i in the game has a value vi
positional general
positional general
There are strategies that are optimal for every starting position
![Page 8: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/8.jpg)
Simple Stochastic game (SSGs) [Condon (1992)]
The outdegrees of all non-sinks are 2
Terminating binary games
Easy reduction from general gamesto terminating binary games
All probabilities are ½.
The game terminates with prob. 1
![Page 9: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/9.jpg)
“Solving” terminating binary SSGs
The values vi of the vertices of a game are the unique solution of the following equations:
Corollary: Decision version in NP co-NP
The values are rational numbersrequiring only a linear number of bits
![Page 10: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/10.jpg)
Value iteration (for binary SSGs)
Iterate the operator:
Converges to the unique solution
But, may require an exponentialnumber of iterations just to get close
![Page 11: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/11.jpg)
Simple Stochastic game (SSGs) Payoff version [Shapley (1953)]
MAX minRAND
R
Limiting average version
Discounted version
![Page 12: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/12.jpg)
Markov Decision Processes (MDPs)
Values and optimal strategies of a MDP can be found by solving an LP
Theorem: [Epenoux (1964)]
MAX minRAND
R
![Page 13: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/13.jpg)
SSG NP co-NP – Another proof
Deciding whether the value of a game isat least (at most) v is in NP co-NP
To show that value v ,guess an optimal strategy for MAX
Find an optimal counter-strategy for min by solving the resulting MDP.
Is the problem in P ?
![Page 14: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/14.jpg)
Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)]
Non-terminating version
Discounted version
MPGsPayoffSSGs
Pseudo-polynomial algorithm (PZ’96)
MAX minRAND
R
ReachabilitySSGs
![Page 15: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/15.jpg)
Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)]
Value(σ,) – average of cycle formed
Again, both players have optimal positional strategies.
![Page 16: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/16.jpg)
Selecting the second largest element with only four storage locations [PZ’96]
![Page 17: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/17.jpg)
Parity Games (PGs) A simple example
2
1 4 1
3 2
EVEN wins if largest priorityseen infinitely often is even
Priorities
![Page 18: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/18.jpg)
Parity Games (PGs)
EVEN
3
ODD
8
EVEN wins if largest priorityseen infinitely often is even
Equivalent to many interesting problemsin automata and verification:
Non-emptyness of -tree automata
modal -calculus model checking
![Page 19: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/19.jpg)
Parity Games (PGs)
EVEN
3
ODD
8
Replace priority k by payoff (n)k
Mean Payoff Games (MPGs)
Move payoffs to outgoing edges
[Stirling (1993)] [Puri (1995)]
![Page 20: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/20.jpg)
Switches
…i
Value vector of strategy σ of MAX with respect to the optimal counter
strategy of min
![Page 21: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/21.jpg)
Strategy/Policy Iteration
Start with some strategy σ (of MAX)
While there are improving switches, perform some of them
As each step is strictly improving and as there is a finite number of strategies, the algorithm
must end with an optimal strategy
SSG PLS (Polynomial Local Search)
![Page 22: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/22.jpg)
Strategy/Policy IterationComplexity?
Performing only one switch at a time may lead to exponentially many improvements,even for MDPs [Condon (1992)]
What happens if we perform all profitable switches [Hoffman-Karp (1966)]
???
Not known to be polynomialBest upper bound: O(2n/n) [Mansour-Singh (1999)]
No non-linear examplesBest lower bounds: 2n-O(1) [Madani (2002)]
![Page 23: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/23.jpg)
A randomized subexponential algorithm for simple stochastic games
![Page 24: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/24.jpg)
Start with an arbitrary strategy for MAX
Choose a random vertex iVMAX
Find the optimal strategy ’ for MAX in the gamein which the only outgoing edge of i is (i,(i))
If switching ’ at i is not profitable, then ’ is optimal
Otherwise, let (’)i and repeat
A randomized subexponentialalgorithm for binary SSGs
[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)]
![Page 25: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/25.jpg)
A randomized subexponentialalgorithm for binary SSGs
[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)]
There is a hidden order of MAX vertices under which the optimal strategy returned by
the first recursive call correctly fixes the strategy of MAX at vertices 1,2,…,i
All correct !Would never be switched !
MAX vertices
![Page 26: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/26.jpg)
The hidden order
ui(σ) - the maximum sum of values of a strategy of MAX that agrees with σ on i
![Page 27: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/27.jpg)
The hidden order
Order the vertices such that
Positions 1,..,iwere switched
and would neverbe switched again
![Page 28: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/28.jpg)
SSGs are LP-type problems[Björklund-Sandberg-Vorobyov (2002)]
[Halman (2002)]
General (non-binary) SSGs can be solved in time
AUSO – Acyclic Unique Sink Orientations
![Page 29: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/29.jpg)
Parity Games (PGs) A simple example
2
1 4 1
3 2
EVEN wins if largest priorityseen infinitely often is even
Priorities
![Page 30: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/30.jpg)
Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)]
Vertices of highest priority
(even)
Vertices from whichEVEN can force the
game to enter A
Firstrecursive
call
Lemma: (i)
(ii)
![Page 31: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/31.jpg)
Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)]
Second recursive
call
In the worst case, both recursive calls are on games of size n1
![Page 32: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/32.jpg)
Deterministic subexponential alg for PGs Jurdzinski, Paterson, Z (2006)
Second recursive
call
Dominion
Idea: Look for small
dominions!
Dominion: A (small) set from which one of the players can win without the play ever leaving this set
Dominions of size s can be found
in O(ns) time
![Page 33: Uri Zwick Tel Aviv University](https://reader038.fdocuments.us/reader038/viewer/2022103101/568149ae550346895db6ebe5/html5/thumbnails/33.jpg)
Open problems
● Polynomial algorithms?● Is the Policy Improvement algorithm
polynomial?● Faster subexponential algorithms
for parity games? ● Deterministic subexponential algorithms
for MPGs and SSGs?● Faster pseudo-polynomial algorithms
for MPGs?