Download - Games with partial information

Sequential decision making:
decidability and complexity

Games with partial observation

[email protected] + too many people for being all cited. Includes Inria, Cnrs, Univ. Paris-Sud, LRI, Taiwan universities (including NUTN), CITINES project

TAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud,Digiteo Labs, PascalNetwork of Excellence.

ParisSeptember 2012.

A quite general model

A directed graph (finite).A Starting point on the graph, a target (or several targets, with different rewards).I want to reach a target.

Labels(=decisions) on edges:Next node = f( current node, decision)

Each node is either:- random node (random decision).- decision node (I choose a decision)- opponent node (an opponent chooses)

Partial observation

Each decision nodeis equipped with an observation;you can make decisions usingthe list of past observations

==> you don't know where you are in the graph

Overview

10%: overview of Alternating Turing machine & computational complexity (great tool for complexity upper bounds)

50%: general culture on games (including undecidability)

35%: general culture on fictitious play (matrix games) (probably no time for this...)

4%: my results on that stuff ==> 2 detailed proofs (one new) ==> feel free of interrupting

Outline

Complexity and ATM

Complexity and games (incl. planning)

Bounded horizon games

Classical complexity classes

P NP PSPACE EXPTIME NEXPTIME EXPSPACE

Proved:PSPACE EXPSPACE P EXPTIMENP NEXPTIME

Believed, not proved:PNP EXPTIMENEXPTIMENEXPTIMEEXPSPACE

Complexity and alternating Turing machines

Turing machine (TM)= abstract computer

Non-deterministic Turing Machine (NTM) = TM with for all states (i.e. several transitions, accepts if all transitions accept)

Co-NTM: TM with exists states (i.e. several transitions, accepts if at least one transition accepts)

ATM: TM with both exists and for all states.



Non-deterministic Turing Machine (NTM) = TM with exists states (i.e. several transitions, accepts if at least one accepts)

Co-NTM: TM with exists states (i.e. several transitions, accepts if at least one transition accepts)





Co-NTM: TM with for all states (i.e. several transitions, accepts if all lead to accept)





Co-NTM: TM with for all states (i.e. several transitions, accepts if all lead to accept)


Alternation

Non-determinism & alternation

Outline

Complexity and ATM

Complexity and games (incl. planning)

Bounded horizon games

Computational complexity: framework

Uncertainty can be:Adversarial: I focus on worst case

Stochastic: I focus on average result

Or both.

Stochastic = adversarial if goal = 100% success.Stochastic != adversarial in the general case.


Many representations for problems. E.g.:Succinct: a circuit computes the ith bit of the proba that action a leads to a transition from s to s'

Compressed: a circuit computes many bits simultaneously

Flat: longer encoding (transition tables)

==> does not matter for decidability==> matters for complexity


Many representations for problems. E.g.:Succinct

Compressed

Flat

Compressed representation somehow natural (state space has exponential size, transitions are fast): see e.g. Mundhenk for detailed defs and flat representations.


We use mainly compressed representation; see also Mundhenk for flat representations.

Typically, exponentially small representations lead to exponentially higher complexity==> but it's not always the case...

Simple things can change a lot the complexity:superko: rules forbid twice the same position; some fully observable 2Player games become EXPSPACE instead of EXP ==> discussed later

Computational complexity: framework for first tables of results

Either search (find a target) or optimize (cumulate rewards over time)

Compressed (written with circuits or others...)or not (flat).

Horizon:- Short horizon: horizon size of input- Long horizon: log2(horizon) size of input- Infinite horizon: no limit

Mundhenk's summary: one player, limited horizon: expected reward >0 ?

Mundhenk's summary: one player, non-negative reward, looking for non-neg. average reward (= positive proba of reaching): easier

Complexity, partial observation, infinite horizon, proba of reaching a target

1P+random, unobservable: undecidable (Madani et al)

1P+random, P(win=1), or equivalently 2P, P(win=1): [Rintanen and refs therein]Fully observable: EXP [Littman94]

Unobservable: EXPSPACE [Hasslum et al 2000]

Partial observability: 2EXP [rintanen, 2003]

Rmk: 2P, P(win=1) is not 2P!

Complexity, partial observation, infinite horizon

2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine]

2P (random or not): Existence of sure win: equiv. to 1P+random !EXP full-observable (e.g. Go, Robson 1984)

PSPACE unobservable

2EXP partially observable

Existence of sure win, same state forbidden: EXPSPACE-complete (Go with Chinese rules ? rather conjectured EXPTIME or PSPACE...)

General case (optimal play): undecidable (Auger, Teytaud) (what about phantom-Go ?)

Complexity, partial observation

Remarks:Continuous case ?

Purely epistemic (we gather information, we don't change the state) ? [Sabbadin et al]

Restrictions on the policy, on the set of actions...

Discounted reward

DEC-POMDP, POSG : many players, same/opposite/different reward functions...

What are the approaches ?

Dynamic programming (Mass Bellman 50's) (still the main approach in industry), alpha-beta, retrograde analysis

Reinforcement learning

MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006)

Scripts + Tuning / Direct Policy Search

Coevolution

All have their PO extensions but the two last are the most convenient in this case.

Partially observable games

Many tools for fully observable games.Not so many for partially observable ones.

Shi-Fu-Mi (Rock Paper Scissor)

Card games

Phantom games

Shi-Fu-Mi (Rock-Paper-Scissors)

Fully observable in simultaneous play, but partially observable in turn-based version.

Computers stronger than humans (yes, it's true).

Card games, phantom games

Phantomized version of a game:You don't see the move of your opponents

If you play an illegal move, you are informed that it's illegal, you play again

Usually, you get a few more information (captures, threats...) Dark Chess: more info

phantom-Go

etc.

Partially observable games

Usually quite heuristic algorithms

Best performing algorithms combine:Opponent modelling (as for Shi-Fu-Mi)

Belief state (often by Monte-Carlo simulations)

Not a lot of tree search

A lot of tuning==> usually no consistency analysis

Part I: Complexity analysis (unbounded horizon)

Game:One or two players

Win, loss, draw (incl. endless loop)

Partial observability, no random part

Finite state space:state=transition(state,action)

action decided by each player in turn

State of the art

- makes sense in fully observable games- not so much in non-observable games

State of the art

EXPTIME-complete in the generalfully-observable case

EXPTIME-complete fully observable games

- Chess (for some nxn generalization)

- Go (with no superko)

- Draughts (international or english)

- Chinese checkers

- Shogi

PSPACE-complete fully observable games

- Amazons- Hex- Go-moku- Connect-6- Qubic- Reversi- Tic-Tac-ToeMany games with filling of each cell once and only oncepolynomial horizon+full observation==> PSPACE

EXPSPACE-complete unobservable games (Hasslun & Jonnsson)

The two-player unobservable case is EXPSPACE-complete (games in succinct form, infinite horizon).

(still for 100%win UD criterion - for not fully observable cases it is necessary to be precise...)

Importantly, the UD criterion means that strategiesare the same if the opponent has full observation as if he has no observation ==> UD is very bad :-(


The two-player unobservable case is EXPSPACE-complete (games in succinct form).PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of Actions (b) Check the result against all possible strategies (III) We have to check the hardness only.


The two-player unobservable case is EXPSPACE-complete (games in succinct form).PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (exponential list of actions is enough...) (b) Check the result against all possible strategies (III) We have to check the hardness only.


The two-player unobservable case is EXPSPACE-complete (games in succinct form).PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (b) Check the result against all possible strategies (III) We have to check the hardness only.


The two-player unobservable case is EXPSPACE-complete (games in succinct form).PROOF of the hardness: Reduction to: is my TM with exponential tape going to halt ?

Consider a TM with tape of size N=2^n.

We must find a game - with size n ( n= log2(N) )- such that the first player has a winning strategy for player 1 iff the TM halts.


Encoding a Turing machine with a tape of size Nas a game with state O(log(N))Player 1 chooses the sequence of configurations of the tape (N=4):

x(0,1),x(0,2),x(0,3),x(0,4) ==> initial statex(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) .....................................



x(0,1),x(0,2),x(0,3),x(0,4) ==> initial statex(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) ..................................... x(N,1), x(N,2), x(N,3), x(N,4)

Wins byfinal state !



x(0,1),x(0,2),x(0,3),x(0,4) ==> initial statex(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) ..................................... x(N,1), x(N,2), x(N,3), x(N,4)

Wins byfinal state !

Except if P2 finds an illegal transition!

==> P2 can check the consistency of one 3-uple per line

==> requests space log(N) ( = position of the 3-uple)

EXPSPACE-complete unobservable games

The 1P+unknown initial state in the unobservable case is EXPSPACE-complete (games in succinct form).

2P+unobservable as well.

2EXPTIME-complete PO games

The two-player PO case,or 1P+random PO is 2EXP-complete (games in succinct form).

(2P = 1P+random because of UD)

Undecidable games (B. Hearn)

The three-player PO case is undecidable. (two players against one,not allowed to communicate)

Hummm ?

Do you know a PO game in which you can ensure a win with probability 1 ?

Another formalization

==> much more satisfactory (might have drawbacks as well...)

c

Madani et al.

1 player + random = undecidable (even without opponent!)

c

Madani et al.

1 player + random = undecidable.==> answers a (related) question by Papadimitriou and Tsitsiklis.

Proof ?

Based on the emptiness problem for probabilistic finite automata (see Paz 71):

Given a probabilistic finite automaton,is there a word accepted with proba at least c ?==> undecidable

Consequence for unobservable games

1 player + random = undecidable==> 2 players = undecidable.

c

Proof of undecidability with 1 player against random ==> undecidability with 2 players

How to simulate 1 player + random with 2 players ?

A random node to be rewritten



Rewritten as follows:Player 1 chooses a in [[0,N-1]]

Player 2 chooses b in [[0,N-1]]

c=(a+b) modulo N

Go to tc

Each player can force the game to be equivalent to the initial one (by playing uniformly)==> the proba of winning for player 1 (in case of perfect play) is the same as for for the initial game==> undecidability!

Important remark

Existence of a strategy for winning with proba 0.5 = also undecidable for therestriction to games in which the probais >0.6 or not just a subtleprecision trouble.

So what ?

We have seen that unbounded horizon + partial observability + natural criterion (not sure win)==> undecidabilitycontrarily to what is expected from usual definitions.

What about bounded horizon, 2P ?Clearly decidable

Complexity ?

Algorithms ? (==> coevolution & LP)

Complexity (2P, 0-sum, no random)

Unbounded Exponential Polynomial horizon horizon horizon

FullObservability EXP EXP PSPACE

No obs EXPSPACE NEXP(X=100%) (Hasslum et al, 2000)

Partially 2EXP EXPSPACE Observable (Rintanen) (Mundhenk)(X=100%)

Simult. Actions ? EXPSPACE ? converges to Nash (Robinson 51)

Fictitious play

TODO

Improvements for KxK matrix game: approximations

There exists approximations in size O(log(K)/2) [Althoefer]

Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP

Improvements for KxK matrix game: exact solution if k-sparse






Exact solution in time (Auger, Ruette, Teytaud) O (K log K k 2k + poly(k) ) if solution k-sparse (good only if k smaller than log(K)/log(log(K)) ! better ?)


So, LP & FP are two tools for matrix games.

LP programming can be adapted to PO games without building the complete matrix (using information sets).

The same for FP variants ?

Conclusions

There are still natural questions which provide nice decidability problemsMadani et al (1 player against random, no observability), extended here to 2 players with no random ==> undecidable problems less than the Halting problem ? Solving zero-sum matrix-games is still an active area of researchApproximate cases

Sparse case

Open problems

Phantom-Go undecidable ? (or other real game...)

Complexity of Go with Chinese rules ? (conjectured: PSPACE or EXPTIME; proved PSPACE-hard + EXPSPACE)

More to say about epistemic games (internal state not modified)

Frontier of undecidability in PO games ?(100% halting game: 2P become decidable)

Chess with finitely many pieces on infinite board: decidability of forced-mate ? (n-move: Brumleve et al, 2012, simulation in Presburger (thanks S. Riis :-) )

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level