University of California, Berkeley1 Abstract Stochastic ω-Regular Games by Krishnendu Chatterjee...

Stochastic Omega-Regular Games

Krishnendu Chatterjee

Electrical Engineering and Computer SciencesUniversity of California at Berkeley

Technical Report No. UCB/EECS-2007-122

http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-122.html

October 8, 2007

Copyright © 2007, by the author(s).All rights reserved.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission.

Stochastic ω-Regular Games

by


B. Tech. (IIT, Kharagpur) 2001M.S. (University of California, Berkeley) 2004

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

in

Computer Science

in the

GRADUATE DIVISION

of the

UNIVERSITY of CALIFORNIA at BERKELEY

Committee in charge:

Professor Thomas A. Henzinger, ChairProfessor Christos PapadimitriouProfessor John Steel

Fall, 2007

The dissertation of Krishnendu Chatterjee is approved:

Chair Date

Date

Date

University of California at Berkeley

Fall, 2007


Copyright Fall, 2007

by


1

Abstract


by


Doctor of Philosophy in Computer Science

University of California at Berkeley

Professor Thomas A. Henzinger, Chair

We study games played on graphs with ω-regular conditions specified as parity, Rabin,

Streett or Muller conditions. These games have applications in the verification, synthesis,

modeling, testing, and compatibility checking of reactive systems. Important distinctions

between graph games are as follows: (a) turn-based vs. concurrent games, depending

on whether at a state of the game only a single player makes a move, or players make

moves simultaneously; (b) deterministic vs. stochastic, depending on whether the transition

function is a deterministic or a probabilistic function over successor states; and (c) zero-sum

vs. non-zero-sum, depending on whether the objectives of the players are strictly conflicting

or not.

We establish that the decision problem for turn-based stochastic zero-sum games

with Rabin, Streett, and Muller objectives are NP-complete, coNP-complete, and PSPACE-

complete, respectively, substantially improving the previously known 3EXPTIME bound.

We also present strategy improvement style algorithms for turn-based stochastic Rabin and

Streett games. In the case of concurrent stochastic zero-sum games with parity objectives

we obtain a PSPACE bound, again improving the previously known 3EXPTIME bound. As

2

a consequence, concurrent stochastic zero-sum games with Rabin, Streett, and Muller ob-

jectives can be solved in EXPSPACE, improving the previously known 4EXPTIME bound.

We also present an elementary and combinatorial proof of the existence of memoryless ε-

optimal strategies in concurrent stochastic games with reachability objectives, for all real

ε > 0, where an ε-optimal strategy achieves the value of the game with in ε against all strate-

gies of the opponent. We also use the proof techniques to present a strategy improvement

style algorithm for concurrent stochastic reachability games.

We then go beyond ω-regular objectives and study the complexity of an important

class of quantitative objectives, namely, limit-average objectives. In the case of limit-average

games, the states of the graph is labeled with rewards and the goal is to maximize the long-

run average of the rewards. We show that concurrent stochastic zero-sum games with

limit-average objectives can be solved in EXPTIME.

Finally, we introduce a new notion of equilibrium, called secure equilibrium, in non-

zero-sum games which captures the notion of conditional competitiveness. We prove the

existence of unique maximal secure equilibrium payoff profiles in turn-based deterministic

games, and present algorithms to compute such payoff profiles. We also show how the

notion of secure equilibrium extends the assume-guarantee style of reasoning in the game

theoretic framework.

Professor Thomas A. HenzingerDissertation Committee Chair

i

To Maa (my mother) ...

ii

Contents

List of Figures v

List of Tables vi

1 Introduction 1

2 Definitions 12

2.1 Game Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.1 Turn-based probabilistic game graphs . . . . . . . . . . . . . . . . . 12

2.1.2 Concurrent game graphs . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Types of strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 Probability space and outcomes of strategies . . . . . . . . . . . . . 17

2.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Game Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.5 Determinacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.6 Complexity of Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Concurrent Games with Tail Objectives 30

3.1 Tail Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.1 Completeness of limit-average objectives . . . . . . . . . . . . . . . . 32

3.2 Positive Limit-one Property . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 Zero-sum Tail Games to Nonzero-sum Reachability Games . . . . . . . . . . 47

3.4 Construction of ε-optimal Strategies for Muller Objectives . . . . . . . . . . 54

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 Stochastic Muller Games 60

4.1 Markov decision processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.1.1 MDPs with reachability objectives . . . . . . . . . . . . . . . . . . . 61

4.1.2 MDPs with Muller objectives . . . . . . . . . . . . . . . . . . . . . . 62

4.1.3 MDPs with Rabin and Streett objectives . . . . . . . . . . . . . . . 65

4.2 212 -player Games with Muller objectives . . . . . . . . . . . . . . . . . . . . 67

4.3 Optimal Memory Bound for Pure Qualitative Winning Strategies . . . . . . 68

CONTENTS iii

4.3.1 Complexity of qualitative analysis . . . . . . . . . . . . . . . . . . . 80

4.4 Optimal Memory Bound for Pure Optimal Strategies . . . . . . . . . . . . . 83

4.4.1 Complexity of quantitative analysis . . . . . . . . . . . . . . . . . . . 90

4.4.2 The complexity of union-closed and upward-closed objectives . . . . 97

4.5 An Improved Bound for Randomized Strategies . . . . . . . . . . . . . . . . 100

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5 Stochastic Rabin and Streett Games 104

5.1 Qualitative Analysis of Rabin Games . . . . . . . . . . . . . . . . . . . . . . 104

5.2 Strategy Improvement for 212 -player Rabin and Streett Games . . . . . . . . 116

5.2.1 Key Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5.2.2 Strategy Improvement Algorithm . . . . . . . . . . . . . . . . . . . . 118

5.3 Randomized Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.4 Optimal Strategy Construction for Streett Objectives . . . . . . . . . . . . 129

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

6 Concurrent Reachability Games 135

6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.2 Markov Decision Processes of Memoryless Strategies . . . . . . . . . . . . . 139

6.3 Existence of Memoryless ε-Optimal Strategies . . . . . . . . . . . . . . . . 141

6.3.1 From value iteration to selectors . . . . . . . . . . . . . . . . . . . . 141

6.3.2 From value iteration to optimal selectors . . . . . . . . . . . . . . . . 145

6.4 Strategy Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.4.1 The strategy-improvement algorithm . . . . . . . . . . . . . . . . . . 150

6.4.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

7 Concurrent Limit-average Games 156

7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

7.2 Theory of Real-closed Fields and Quantifier Elimination . . . . . . . . . . . 159

7.3 Computation of Values in Concurrent Limit-average Games . . . . . . . . . 163

7.3.1 Sentence for the value of a concurrent limit-average game . . . . . . 163

7.3.2 Algorithmic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

7.3.3 Approximating the value of a concurrent limit-average game . . . . . 168

7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8 Concurrent Parity Games 173

8.1 Strategy Complexity and Computational Complexity . . . . . . . . . . . . . 175

8.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

9 Secure Equilibria and Applications 199

9.1 Non-zero-sum Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9.2 Secure Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

9.3 2-Player Non-Zero-Sum Games on Graphs . . . . . . . . . . . . . . . . . . . 208

9.3.1 Unique maximal secure equilibria . . . . . . . . . . . . . . . . . . . . 209

CONTENTS iv

9.3.2 Algorithmic characterization of secure equilibria . . . . . . . . . . . 2169.4 Assume-guarantee Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

9.4.1 Co-synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2229.4.2 Game Algorithms for Co-synthesis . . . . . . . . . . . . . . . . . . . 227

9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Bibliography 236

v

List of Figures

3.1 A simple Markov chain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2 An illustration of idea of Theorem 5. . . . . . . . . . . . . . . . . . . . . . . 423.3 A game with Buchi objective. . . . . . . . . . . . . . . . . . . . . . . . . . . 513.4 A concurrent Buchi game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.1 The sets of the construction. . . . . . . . . . . . . . . . . . . . . . . . . . . 744.2 The sets of the construction with forbidden edges. . . . . . . . . . . . . . . 75

5.1 Gadget for the reduction of 212 -player Rabin games to 2-player Rabin games. 106

5.2 The strategy sub-graph in Gσ . . . . . . . . . . . . . . . . . . . . . . . . . . 1085.3 The strategy sub-graph in Gπ . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.4 Gadget for the reduction of 21

2 -player Streett games to 2-player Streett games.114

6.1 An MDP with reachability objective. . . . . . . . . . . . . . . . . . . . . . . 143

9.1 A graph game with reachability objectives. . . . . . . . . . . . . . . . . . . 2039.2 A graph game with Buchi objectives. . . . . . . . . . . . . . . . . . . . . . . 2049.3 Mutual-exclusion protocol synthesis . . . . . . . . . . . . . . . . . . . . . . 2229.4 Peterson’s mutual-exclusion protocol . . . . . . . . . . . . . . . . . . . . . . 226

vi

List of Tables

5.1 Strategy complexity of 212 -player games and its sub-classes with ω-regular

objectives, where ΣPM denotes the family of pure memoryless strategies,ΣPF denotes the family of pure finite-memory strategies and ΣM denotes thefamily of randomized memoryless strategies. . . . . . . . . . . . . . . . . . . 130

5.2 Computational complexity of 212 -player games and its sub-classes with ω-

regular objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

8.1 Strategy complexity of concurrent games with ω-regular objectives, whereΣPM denotes the family of pure memoryless strategies, ΣM denotes the fam-ily of randomized memoryless strategies, and ΣHI denotes the family of ran-domized history dependent, infinite-memory strategies. . . . . . . . . . . . 197

8.2 Computational complexity of concurrent games with ω-regular objectives. 197

LIST OF TABLES vii

Acknowledgements

I am deeply grateful to my advisor, Tom Henzinger, for his wonderful support and guidance

during my stay in Berkeley. Over the last five years he taught me all I know of research in

the field of verification. He taught me how to think about research problems, helped me

make significant progress in skills that are essential for a researcher, he taught me how to

write precisely and concisely, even carefully and patiently helped me in correcting all my

punctuation and grammatical mistakes. His enthusiasm, patience, ability to concretize an

ill-conceived idea to a precise problem, suggesting new ways to attack a problem when I got

stuck and his brilliance will always remain a source of inspiration. His influence is present

in every page of this thesis and will be in write-ups that I write in future. I only wish a

small fraction of his abilities has rubbed off on me.

I am thankful to Luca de Alfaro, Rupak Majumdar and Marcin Jurdzinski for

several collaborative works that appear in this dissertation. It was an absolute pleasure

to work with Luca, and from him I received innumerable intuitions on the behavior of

concurrent games that form a large part of the thesis. Research with Rupak is a truly

wonderful experience: his ability to suggest relevant and interesting new problems and

amazing sense of humor always made our research discussions a fun experience. I started

my research on graph games with Marcin and I am grateful to him getting me interested in

the topic of graph games and explaining all the basics. Other than helping me in research,

he also influenced me a lot on how to clearly communicate an idea and patiently answer

to all questions (in my early days of research I surely had many stupid questions for him).

I am also fortunate to collaborate with Orna Kupferman and Nir Piterman, and I thank

them for sharing with me their knowledge of automata theory and teaching me a new way

to look at games via automata. I am also thankful to Jean-Francois Raskin, Laurent Doyen

and Radha Jagadeesan for fruitful research collaborations; it was a pleasure to work with

them. I am in debt to P.P. Chakrabarti and Pallab Dasgupta, who were my undergraduate

LIST OF TABLES viii

mentors in IIT Kharagpur and introduced me to the field of formal methods and taught me

all the basics in computer science. I feel simply lucky that such brilliant people brought me

so caringly and smoothly to the field of verification and games.

I am grateful to a lot of people who read several of the results that appear in this

thesis (as manuscripts or conference publications) and helped me with their comments to

improve the results and presentation. Kousha Etessami and Mihalis Yannakakis pointed

out a flaw in a statement of a result of chapter 8 and then helped to the extent of correctly

formulating the result (which currently appears in chapter 8). I am truly grateful to them.

Hugo Gimbert with his valuable comments helped me to make the results of chapter 3

precise. Abraham Neyman helped immensely with his comments on chapter 7; his comments

were extremely helpful in improving and formalizing the results.

Christos Papadimitriou taught us an amazing course on “Algorithms, Internet

and Game Theory”and reinforced my interests in games. I thank him for the course and

for serving in my thesis committee. George Necula taught us a course on “Programming

Languages” and illuminated us with several aspects of program verification; and though I

have not worked much on this field he instilled in me a lot of interest and I hope to pursue

research in program verification in future. I also thank him for serving in my qualifying

exam committee. I am thankful to John Steel who readily agreed to serve on my qualifying

exam and thesis committee.

I thank all my friends who made my stay in Berkeley such a wonderful experience.

I already had old friends Arindam and Arkadeb and made many new friends. I had amazing

discussions with labmates Arindam, Slobodan, Vinayak, Satrajit and Arkadeb. I had an

excellent roommate Kaushik who helped me in many ways, and shared his vast knowledge

on cricket, tennis, movies, and so many other topics. The other highlight was our great

cricket sessions with Kaushik and Rahul. I had some great parties in the company of

Kaushik, Rahul, Pankaj, Arkadeb, Mohan Dunga, Satrajit, and many more friends. I had

LIST OF TABLES ix

some great time with some of my other close friends in Berkeley such as Ambuj, Anurag,

Shanky, Vishnu, Ankit, Parag, Sanjeev, · · · . I was fortunate to meet Nilim and Antar-da

in Berkeley, who in my early years took care of me as their younger brother.

Two of my school teachers: Manjusree Mukherjee and Sreya Kana Mukherjee; and

two of my great friends: Binayak Roy and Abhijit Guria, will always remain as a source of

inspiration and I can always rely on them when I am in trouble. Binayak, in my college

days, taught me how to think about mathematics and without him nothing would have

been possible. I am thankful and grateful to them in too many ways to list.

Finally, my family has been an endless source of love, affection, support and mo-

tivation for me. My grand-parents, parents, Jetha, Jethima, Bad-di, Anju-di, Ranju-di,

Sikha-da, Pradip-da, Abhi, Hriti: all my family members in Calcutta and relatives in Puru-

lia and other parts of Bengal gave me love beyond imagination, support and encouragement

at all stages of my PhD life. My inner strength is my mother and without her this thesis

would not have been possible. So I dedicate this thesis to her.

1

Chapter 1

Introduction

One shot and stochastic games. The study of games provide theoretical foundations in

several fields of mathematics and computer science. The simplest class of games consists

of the “one-step” games — games with single interaction between the agents after which

the game ends and the payoffs are decided (e.g., matrix games). However, a wide class of

games progress over time and in a stateful manner, and the current game depends on the

history of interactions. The class of concurrent stochastic games [Sha53, Eve57], that are

played over a finite state space and played in rounds, are a natural model for such games.

Infinite games. In this thesis we will consider nonterminating games of perfect-information

played on finite graphs. A nonterminating game proceeds for an infinite number of rounds.

The state of a game is a vertex of a graph. In each round, the state changes along an edge of

the graph to a successor vertex. Thus, the outcome of the game being played for an infinite

number of rounds, is an infinite path through the graph. We consider boolean objectives

for the two players: for each player, the resulting infinite path is either winning or losing.

The winning sets of paths are assumed to be ω-regular [Tho97]. Depending on how the

winning sets are specified, we distinguish between parity, Rabin, Streett, and Muller games,

as well as some subclasses thereof. The class of parity, Rabin, Streett and Muller objectives

CHAPTER 1. INTRODUCTION 2

are canonical forms to express ω-regular objectives [Tho97]. Depending on the structure of

the graph, we distinguish between turn-based and concurrent games. In turn-based games,

the graph is partitioned into player-1 states and player-2 states: in player-1 states, player 1

chooses the successor vertex; and in player-2 states, player 2 chooses the successor vertex.

In concurrent games, in every round both players choose simultaneously and independently

from a set of available moves, and the combination of both choices determines the successor

vertex. Finally, we distinguish between deterministic and stochastic games: in stochastic

games, in every round the players’ moves determine a probability distribution on the possible

successor vertices, instead of determining a unique successor.

These games play a central role in several areas of computer science. One impor-

tant application arises when the vertices and edges of a graph represent the states and tran-

sitions of a reactive system, and the two players represent controllable versus uncontrollable

decisions during the execution of the system. The synthesis problem (or control problem)

for reactive systems asks for the construction of a winning strategy in the corresponding

graph game. This problem was first posed independently by Alonzo Church [Chu62] and

Richard Buchi [Buc62] in settings that can be reduced to turn-based deterministic games

with ω-regular objectives. The problem was solved independently by Michael Rabin using

logics on trees [Rab69], and by Buchi and Lawrence Landweber using a more game-theoretic

approach [BL69]; it was later resolved using improved methods [GH82, McN93] and in dif-

ferent application contexts [RW87, PR89]. Game-theoretic formulations have proved useful

not only for synthesis, but also for the modeling [Dil89, ALW89], refinement [HKR02], ver-

ification [dAHM00b, AHK02], testing [BGNV05], and compatibility checking [dAH01] of

reactive systems. The use of ω-regular objectives is natural in these application contexts.

This is because the winning conditions of the games arise from requirements specifications

for reactive systems, and the ω-regular sets of infinite paths provide an important and

robust paradigm for such specifications [MP92]. However, both the restriction to determin-


istic games and the restriction to turn-based games are limiting in some respects: prob-

abilistic transitions are useful to model uncertain behavior that is not strictly adversarial

[Var85, CY95], and concurrent choice is useful to model certain forms of synchronous inter-

action between reactive systems [dAHM00a, dAHM01]. The resulting concurrent stochastic

games have long been familiar to game theorists and mathematicians, sometimes under the

name of competitive Markov decision processes [FV97].

Qualitative and quantitative analysis. The central computational problem about a

game is the question of whether a player has a strategy for winning the game. However, in

stochastic graph games there are several degrees of “winning”: we may ask if a player has

a strategy that ensures a winning outcome of the game, no matter how the other player

resolves her choices (this is called sure winning); or we may ask if a player has a strategy

that achieves a winning outcome of the game with probability 1 (almost-sure winning);

or we may ask if the maximal probability with which a player can win is 1 in the limit,

defined as the supremum over all possible strategies of the infimum over all adversarial

strategies (limit-sure winning). While all three notions of winning coincide for turn-based

deterministic games [Mar75], and almost-sure winning coincides with limit-sure winning for

turn-based stochastic games [CJH03] (see Corollary 5 of chapter 4), all three notions are

different for concurrent games, even in the deterministic case [dAHK98]. This is because

for concurrent games, strategies that use randomization are more powerful than pure (i.e.,

nonrandomized) strategies. The computation of sure winning, almost-sure winning, and

limit-sure winning states is called the qualitative analysis of graph games. This is in contrast

to the quantitative analysis, which asks for computing for each state the maximal probability

with which a player can win in the limit, even if that limit is less than 1. For a fixed player,

the limit probability is called the sup-inf value, or the optimal value, or simply the value of

the game at a state. A strategy that achieves the optimal value is an optimal strategy, and

a strategy that ensures one of the three ways of winning, is a sure (almost-sure; limit-sure)


winning strategy. Concurrent graph games are more difficult than turn-based graph games

for several reasons. In concurrent games, optimal strategies may not exist, but for every

real ε > 0, there may be a strategy that guarantees a winning outcome with a probability

that lies within ε of the optimal value [Eve57]. Moreover, ε-optimal or limit-sure winning

strategies may require infinite memory about the history of a game in order to prescribe

the next move of a player [dAH00]. By contrast, in the simplest scenarios —for example,

in the case of turn-based stochastic games with parity objectives— optimal and winning

strategies require neither randomization nor memory (see chapter 5); such pure memoryless

strategies can be implemented by control maps from states to moves.

A game that has a winning strategy for one of the two players at every vertex

is called determined. There are two kinds of determinacy results for graph games. First,

the turn-based deterministic games have a qualitative determinacy, namely, determinacy

for sure winning : in every state of the game graph, one of the two players has a sure

winning strategy [Mar75]. Second, the turn-based stochastic games and the concurrent

games have a quantitative determinacy, that is, determinacy for optimal values: in every

state, the optimal values for both players add up to 1 [Mar98]. Both the sure-winning

determinacy result and the optimal-value determinacy results hold for all Borel objectives;

the sure-winning determinacy for turn-based deterministic games with Borel objectives

was established by Donald Martin [Mar75] and the optimal-value determinacy for Borel

objectives was established again by Donald Martin [Mar98] for a very general class of

games called Blackwell games, which include all games we consider in this thesis. For

concurrent games, however, there is no determinacy for sure winning: even if a concurrent

game is deterministic (i.e., nonstochastic) and the objectives are simple (e.g., single-step

reachability), neither player may have a strategy for sure winning [dAHK98]. Determinacy is

useful for solving games: when computing the sure winning states of a game, or the optimal

values, we can switch between the dual views of the two players whenever convenient.


Quantitative objectives. So far we have discussed about qualitative objectives, i.e., an

outcome of the game is assigned payoff either 0 or 1. The more general case of quantitative

objectives consist of measurable functions that assign real valued rewards to outcomes of a

game. Several quantitative objectives have been studied by game theorists and also in the

context of economics. The notable quantitative objectives are discounted reward and limit-

average (or mean-payoff) objectives. In such games the states of the game graph is labeled

with real valued rewards: for discounted reward objectives the payoff is the discounted sum

of the rewards, and for limit-average objectives the payoff is the long-run average of the

rewards. Games with discounted reward objectives were introduced by Shapley [Sha53] and

has been studied in economics and also in systems theory [dAHM03]. The limit-average

objectives has also been studied extensively in game theory [MN81].

Nonzero-sum games. In nonzero-sum games, both players may be winning. In this case,

the notion of rational behavior of the players is captured by Nash equilibria: a pair of

strategies for the two players is a Nash equilibrium if neither player can increase her payoff

by unilaterally switching her strategy [Jr50]. In stochastic games Nash equilibria exists in

some special cases, and in the general setting, the existence of ε-Nash equilibria, for ε > 0,

is investigated. A pair of strategies for the two players is an ε-Nash equilibrium, for ε > 0,

if neither player can increase her payoff by at least ε by switching strategy. We now present

the fundamental results on stochastic games, and then state the main contribution of the

thesis.

Previous results on turn-based deterministic games. Sure determinacy for turn-

based deterministic games with Borel objectives was established by a deep result of Mar-

tin [Mar75]: the result of Martin showed that for complementary objectives for the players,

the sure winning set for the two-players form a partition of the state space. For the special

case of Muller objectives, the result of Gurevich-Harrington [GH82] showed that finite-

memory sure-winning strategies exist for each player from their respective sure-winning set.


In the case of Rabin objectives existence of pure-memoryless sure-winning strategy has been

established in [EJ88], and the results of [EJ88] also proved that turn-based deterministic

games with Rabin and Streett objectives are NP-complete and coNP-complete, respectively.

Zielonka [Zie98] used a tree representation of Muller objectives (referred as the Zielonka tree)

and presented an elegant analysis of turn-based deterministic games with Muller objectives.

Using an insightful analysis of Zielonka’s result, the result of [DJW97] presented an opti-

mal memory bound for pure sure-winning strategies for turn-based deterministic Muller

games. The complexity of turn-based deterministic games with Muller objectives was stud-

ied in [HD05] and the problem was shown to be PSPACE-complete. The algorithmic study

of turn-based deterministic games has received much attention in literature. A few notable

of them are as follows: (a) small progress measure algorithm [Jur00], strategy improvement

algorithm [VJ00], and subexponential time algorithm [JPZ06] for parity games, (b) algo-

rithms for Streett and Rabin games [Hor05, KV98, PP06], and (c) algorithms for Muller

games [Zie98, HD05].

Previous results on concurrent games. The optimal value determinacy for one-shot

games is the famous minmax theorem of von Neumann, and such games can be solved in

polynomial time using linear programming. For concurrent games sure-determinacy does

not hold, and the optimal value determinacy for concurrent games with Borel objectives was

established by Martin [Mar98]. Concurrent games with qualitative reachability and more

general parity objectives have been studied in [dAHK98, dAH00]. The computation of

sure, almost-sure and limit-sure states can be computed in polynomial time for reachability

objectives [dAHK98], and for parity objectives the problems are in NP ∩ coNP [dAH00].

The values of concurrent games with parity objectives was characterized by quantitative

µ-calculus formulas in [dAM01], and from the characterization a 3EXPTIME algorithm

was obtained to solve concurrent parity games. The reduction of Rabin, Streett and Muller

objectives to parity objectives (an exponential reduction) [Tho97] and the algorithm for


parity objectives yield a 4EXPTIME algorithm to solve concurrent Rabin, Streett and

Muller games. For the special case of turn-based stochastic games the algorithm of [dAM01]

can be shown to work in 2EXPTIME for parity objectives, and thus one could obtain a

3EXPTIME algorithm for turn-based stochastic Rabin, Streett and Muller games.

Previous results on quantitative objectives. The determinacy of concurrent stochas-

tic games with discounted reward objectives was proved in [Sha53], and the determinacy

for limit-average objectives was proved in [MN81]. The existence of pure memoryless opti-

mal strategies for turn-based deterministic games with limit-average objectives was shown

in [EM79]; and for turn-based stochastic games in [LL69]. The existence of pure memo-

ryless strategies in turn-based stochastic games with discounted reward objectives can be

proved from the results of [Sha53]; see [FV97] for analysis of various classes of games with

discounted reward and limit-average objectives. The complexity of turn-based determin-

istic limit-average games has been studied in [ZP96]; also see [FV97] for algorithms for

turn-based stochastic games with discounted reward and limit-average objectives.

Previous results on nonzero-sum games. The existence of Nash equilibrium in one-

shot concurrent games is the celebrated result of Nash [Jr50]. The computation of Nash

equilibria in one-shot games is PPAD-complete [DGP06, CD06], also see [EY07] for related

complexity results. Nash’s theorem holds for the case when the strategy space is convex

and compact. However, for infinite games and the strategy space is not compact and hence

Nash’s result does not immediately extend to infinite games. In fact for concurrent zero-sum

reachability games Nash equilibria (in case of zero-sum games Nash equilibria correspond

to optimal strategies) need not exist. In such case one investigates ε-Nash equilibria, and

ε-Nash equilibria, for all ε > 0 is the best one can achieve. Exact Nash equilibrium do exist

in discounted stochastic games [Fin64]. For concurrent nonzero-sum games with payoffs

defined by Borel sets, surprisingly little is known. Secchi and Sudderth [SS01] showed that

exact Nash equilibrium do exist when all players have payoffs defined by closed sets (“safety


objectives”). For the special case of two-player games, existence of ε-Nash equilibrium,

for every ε > 0, is known for limit-average objectives [Vie00a, Vie00b], and for parity

objectives [Cha05]. The existence of ε-Nash equilibrium in n-player concurrent games with

objectives in higher levels of Borel hierarchy is an intriguing open problem.

Organization and new results of the thesis. We now present the organization of the

thesis and the main results of each chapter.

1. (Chapter 2). The basic definitions of various classes of games, objectives, strategies,

and the formal notion of determinacy is presented in Chapter 2.

2. (Chapter 3). In Chapter 3 we consider concurrent games with tail objectives (which

is a generalization of Muller objectives) and prove several basic properties, e.g., we

show that if there there is a state with positive value, then there is some state with

value 1. The properties we prove are useful in the analysis of later chapters.

3. (Chapter 4). In Chapter 4 we study turn-based stochastic games with Muller ob-

jectives. The main results of the chapter are as follows:

• we prove an optimal memory bound for pure optimal strategies in turn-based

stochastic Muller games;

• we show the qualitative and quantitative analysis of turn-based stochastic Muller

games are both PSPACE-complete (improving the previous known 3EXPTIME

bound); and

• we present an improved memory bound for randomized optimal strategies as

compared to pure optimal strategies.

4. (Chapter 5). In Chapter 5 we study turn-based stochastic games with Rabin and

Streett objectives. The main results of the chapter are as follows:


• we show the qualitative and quantitative analysis of turn-based stochastic games

with Rabin and Streett objectives are NP-complete and coNP-complete, respec-

tively, (improving the previous known 3EXPTIME bound); and

• we present a strategy improvement algorithm for turn-based stochastic Rabin

and Streett games.

5. (Chapter 6). In Chapter 6 we study concurrent games with reachability objectives.

We present an elementary and combinatorial proof of existence of memoryless ε-

optimal strategies in concurrent games with reachability objectives, for all ε > 0. In

contrast, the previous proofs of the result relied on deep results from analysis (such

as analysis of Puisieux series) [FV97]. The proof techniques we develop also lead to a

strategy improvement algorithm for concurrent reachability games.

6. (Chapter 7). In Chapter 7 we study the complexity of concurrent games with limit-

average objectives and show that these games can be solved in EXPTIME. It also

follows from our results that concurrent games with discounted reward objectives can

be solved in PSPACE. To the best of our knowledge this is the first complexity result

on the solution of concurrent limit-average games. Also the techniques used in the

chapter are useful in the analysis for Chapter 8.

7. (Chapter 8). In Chapter 8 we study the complexity of concurrent games with

parity objectives and show that the quantitative analysis of concurrent parity games

can be achieved in PSPACE (improving the previous 3EXPTIME bound); and as a

consequence obtain an EXPSPACE algorithm for Rabin, Streett and Muller objectives

(as compared to the previously known 4EXPTIME bound).

8. (Chapter 9). In Chapter 9 we study games that are not strictly competitive. We

present a new notion of equilibrium that captures the notion of conditional com-

petitiveness. The new notion of equilibrium, called secure equilibria, captures the


notion of adverserial external choice. We show the maximal secure equilibria payoff is

unique for turn-based deterministic games, and present algorithms to compute such

payoff for ω-regular objectives. We then illustrate its application in the synthesis of

independent processes: we show that the notion of secure equilibria generalizes the

assume-guarantee style of reasoning in the game theoretic framework.

The relevant open problems for each chapter is listed along with concluding remarks for the

respective chapter.

Related topics. In this thesis we consider on games played on graphs with finite state

spaces, where each player has perfect information about the state of the game. We briefly

discuss several extensions of such games which have been studied in the literature.

Beyond games for reactive systems. We have only discussed about games played on graphs

that are mainly used in the analysis of reactive systems. However, graph games are widely

used in several other areas of computer science, such as, Ehrenfeucht and Fraisse games

in finite-model theory, network congestion games and auctioning for the analysis of the

internet [Pap01]. We keep our discussion limited to games related to verification of reactive

systems, and now describe several extensions in this context.

Partial-information games. In the class of partial-information games players only have

partial-information about the state of the game. Such games are much harder to solve

as compared to the perfect-information games, for example, 2-player partial-information

turn-based games are 2EXPTIME complete for reachability objectives [Rei79], and several

problems related to partial-information turn-based games with more than 2-players become

undecidable [Rei79]. The results in [CH05] present a close connection between a sub-class

of partial-information turn-based games and perfect-information concurrent games. The

algorithmic analysis of partial-information turn-based games with ω-regular objectives has

been studied in [CDHR06]. The complexity of partial-information Markov decision processes


has been studied in [PT87].

Infinite-state games. There are several extensions of games played on finite state space

to games played on infinite state space. The notable of them are pushdown games and

timed games. In case of pushdown games the state of the games encode an unbounded

amount of information about the pushdown store (or a stack); such games have been studied

in [Wal96]; also see [Wal04] for a survey. Pushdown games with stochastic transitions have

been studied in [EY05, EY06]. The class of timed games are played on finite state graphs,

but in continuous time with discrete transitions. The modeling of time by clocks make the

games infinite state games, and such games are studied in [MPS95, dAFH+03].

Logic and games. The connection between logical quantifiers and games is deep and well-

established. Game theory also provides an useful framework to study properties of sets. The

results of Martin [Mar75, Mar98] establishing Borel determinacy for 2-player and concurrent

games illuminates several key properties about sets. The close connections between logic on

trees and 2-player games is well-exposed in [Tho97]. The logic µ-calculus is a logic of fixed-

points and is expressive enough to capture all ω-regular objectives [Koz83]. Emerson and

Jutla [EJ91] established the equivalence of µ-calculus model checking and solving 2-player

parity games. Quantitative µ-calculus has been proposed in [dAM01] to solve concurrent

games with parity objectives, and in [MM02] to solve 212 -player games with parity objectives.

The model checking algorithm for the alternating temporal logic ATL requires game solving

procedures as sub-routines [AHK02].

Relationships between games. The relationship between games is an intriguing area of

research. The notions of abstraction of games [HMMR00, HJM03, CHJM05], refinement

relations between games [AHKV98], and distances between games [dAHM03] have been

explored in the literature.

12

Chapter 2

Definitions

In this chapter we will present the definitions of several classes of game graphs,

strategies, objectives, the notion of values and equilibrium. We start with the definition of

game graphs.

2.1 Game Graphs

We first define turn-based game graphs, and then the more general class of con-

current game graphs. We start with some preliminary notation. For a finite set A, a

probability distribution on A is a function δ: A → [0, 1] such that∑

a∈A δ(a) = 1. We write

Supp(δ) = a ∈ A | δ(a) > 0 for the support set of δ. We denote the set of probability

distributions on A by Dist(A).

2.1.1 Turn-based probabilistic game graphs

We consider several classes of turn-based games, namely, two-player turn-based

probabilistic games (212 -player games), two-player turn-based deterministic games (2-player

games), and Markov decision processes (112 -player games).

Turn-based probabilistic game graphs. A turn-based probabilistic game graph (or

CHAPTER 2. DEFINITIONS 13

212 -player game graph) G = ((S,E), (S1, S2, SP ), δ) consists of a directed graph (S,E), a

partition of the vertex set S into three subsets S1, S2, SP ⊆ S, and a probabilistic transition

function δ: SP → Dist(S). The vertices in S are called states. The state space S is finite.

The states in S1 are player-1 states; the states in S2 are player-2 states; and the states in

SP are probabilistic states. For all states s ∈ S, we define E(s) = t ∈ S | (s, t) ∈ E to

be the set of possible successor states. We require that E(s) 6= ∅ for every nonprobabilistic

state s ∈ S1 ∪ S2, and that E(s) = Supp(δ(s)) for every probabilistic state s ∈ SP . At

player-1 states s ∈ S1, player 1 chooses a successor state from E(s); at player-2 states

s ∈ S2, player 2 chooses a successor state from E(s); and at probabilistic states s ∈ SP , a

successor state is chosen according to the probability distribution δ(s).

The turn-based deterministic game graphs (or 2-player game graphs) are the spe-

cial case of the 212 -player game graphs with SP = ∅. The Markov decision processes (MDPs

for short; or 112-player game graphs) are the special case of the 21

2 -player game graphs with

either S1 = ∅ or S2 = ∅. We refer to the MDPs with S2 = ∅ as player-1 MDPs, and to the

MDPs with S1 = ∅ as player-2 MDPs. A game graph that is both deterministic and an

MDP is called a transition system (or 1-player game graph): a player-1 transition system

has only player-1 states; a player-2 transition system has only player-2 states.

2.1.2 Concurrent game graphs

Concurrent game graphs. A concurrent game graph (or a concurrent game structure)

G = (S,A,Γ1,Γ2, δ) consists of the following components:

• A finite state space S.

• A finite set A of moves or actions.

• Two move assignments Γ1,Γ2: S → 2A\∅. For i ∈ 1, 2, the player-i move assignment

Γi associates with every state s ∈ S a nonempty set Γi(s) ⊆ A of moves available to


player i at state s.

• A probabilistic transition function δ: S × A × A → Dist(S). At every state s ∈ S,

player 1 chooses a move a1 ∈ Γ1(s), and simultaneously and independently player 2

chooses a move a2 ∈ Γ2(s). A successor state is then chosen according to the proba-

bility distribution δ(s, a1, a2).

For all states s ∈ S and all moves a1 ∈ Γ1(s) and a2 ∈ Γ2(s), we define Succ(s, a1, a2) =

Supp(δ(s, a1, a2)) to be the set of possible successor states of s when the moves a1 and a2

are chosen. For a concurrent game graph, we define the set of edges as E = (s, t) ∈ S×S |

(∃a1 ∈ Γ1(s))(∃a2 ∈ Γ2(s))(t ∈ Succ(s, a1, a2)), and as with turn-based game graphs, we

write E(s) = t | (s, t) ∈ E for the set of possible successors of a state s ∈ S.

We distinguish the following special classes of concurrent game graphs. The con-

current game graph G is deterministic if |Succ(s, a1, a2)| = 1 for all states s ∈ S and all

moves a1 ∈ Γ1(s) and a2 ∈ Γ2(s). A state s ∈ S is a turn-based state if there exists a player

i ∈ 1, 2 such that |Γi(s)| = 1; that is, player i has no choice of moves at s. If |Γ2(s)| = 1,

then s is a player-1 turn-based state; and if |Γ1(s)| = 1, then s is a player-2 turn-based

state. The concurrent game graph G is turn-based if every state in S is a turn-based state.

Note that the turn-based concurrent game graphs are equivalent to the turn-based proba-

bilistic game graphs: to obtain a 212 -player game graph from a turn-based concurrent game

graph G, for every player-i turn-based state s of G, where i ∈ 1, 2, introduce |Γi(s)| many

probabilistic successor states of s. Moreover, the concurrent game graphs that are both

turn-based and deterministic are equivalent to the 2-player game graphs.

To measure the complexity of algorithms and problems, we need to define the size

of game graphs. We do this for the case that all transition probabilities can be specified as ra-

tional numbers. Then the size of a concurrent game graph G is equal to the size of the prob-

abilistic transition function δ, that is, |G| =∑

s∈S

∑a1∈Γ1(s)

∑a2∈Γ2(s)

∑t∈S |δ(s, a1, a2)(t)|,


where |δ(s, a1, a2)(t)| denotes the space required to specify a rational probability value.

2.2 Strategies

When choosing their moves, the players follow recipes that are called strategies.

We define strategies both for 212 -player game graphs and for concurrent game graphs. On a

concurrent game graph, the players choose moves from a set A of moves, while on a 212 -player

game graph, they choose successor states from a set S of states. Hence, for 212 -player game

graphs, we define the set of moves as A = S. For 212 -player game graphs, a player-1 strategy

prescribes the moves that player 1 chooses at the player-1 states S1, and a player-2 strategy

prescribes the moves that player 2 chooses at the player-2 states S2. For concurrent game

graphs, both players choose moves at every state, and hence for concurrent game graphs,

we define the sets of player-1 states and player-2 states as S1 = S2 = S.

Consider a game graph G. A player-1 strategy on G is a function σ: S∗ · S1 →

Dist(A) that assigns to every nonempty finite sequence ~s ∈ S∗ · S1 of states ending in a

player-1 state, a probability distribution σ(~s) over the moves A. By following the strategy σ,

whenever the history of a game played on G is ~s, then player 1 chooses the next move

according to the probability distribution σ(~s). A strategy must prescribe only available

moves. Hence, for all state sequences ~s1 ∈ S∗ and all states s ∈ S1, if σ(~s1 · s)(a) > 0, then

the following condition must hold: a ∈ E(s) for 212 -player game graphs G, and a ∈ Γ1(s)

for concurrent game graphs G. Symmetrically, a player-2 strategy on G is a function π:

S∗ · S2 → Dist(A) such that if π(~s1 · s)(a) > 0, then a ∈ E(s) for 212 -player game graphs G,

and a ∈ Γ2(s) for concurrent game graphs G. We write Σ for the set of player-1 strategies,

and Π for the player-2 strategies on G. Note that |Π| = 1 if G is a player-1 MDP, and

|Σ| = 1 if G is a player-2 MDP.


2.2.1 Types of strategies

We classify strategies according to their use of randomization and memory.

Use of randomization. Strategies that do not use randomization are called pure. A

player-1 strategy σ is pure (or deterministic) if for all state sequences ~s ∈ S∗ · S1, there

exists a move a ∈ A such that σ(~s)(a) = 1. The pure strategies for player 2 are defined

analogously. We denote by ΣP the set of pure player-1 strategies, and by ΠP the set of pure

player-2 strategies. A strategy that is not necessarily pure is sometimes called randomized.

Use of memory. Strategies in general require memory to remember the history of a

game. The following alternative definition of strategies makes this explicit. Let M be a set

called memory. A player-1 strategy σ = (σu, σn) can be specified as a pair of functions: a

memory-update function σu: S×M → M, which given the current state of the game and the

memory, updates the memory with information about the current state; and a next-move

function σm: S1 ×M → Dist(A), which given the current state and the memory, prescribes

the next move of the player. The player-1 strategy σ is finite-memory if the memory M is

a finite set; and the strategy σ is memoryless (or positional) if the memory M is singleton,

i.e., |M | = 1. A finite-memory strategy remembers only a finite amount of information

about the infinitely many different possible histories of the game; a memoryless strategy is

independent of the history of the game and depends only on the current state of the game.

Note that a memoryless player-1 strategy can be represented as a function σ: S1 → Dist(A).

A memoryless strategy σ is uniform memoryless if the memoryless strategy is an uniform

distribution over its support, i.e., for all states s we have σ(s)(a) = 0 if a 6∈ Supp(σ(s))

and σ(s)(a) = 1|Supp(σ(s))| if a ∈ Supp(σ(s)). We denote by ΣF the set of finite-memory

player-1 strategies, by ΣM and ΣUM the set of memoryless and uniform memoryless player-

1 strategies. The finite-memory player-2 strategies ΠF , the memoryless player-2 strategies

ΠM and uniform memoryless player-2 strategies ΠUM are defined analogously.


A pure finite-memory strategy is a pure strategy that is finite-memory; we write ΣPF =

ΣP ∩ ΣF for the pure finite-memory player-1 strategies, and ΠPF for the corresponding

player-2 strategies. A pure memoryless strategy is a pure strategy that is memoryless. The

pure memoryless strategies use neither randomization nor memory; they are the simplest

strategies we consider. Note that a pure memoryless player-1 strategy can be represented

as a function σ: S1 → A. We write ΣPM = ΣP ∩ ΣM for the pure memoryless player-1

strategies, and ΠPM for the corresponding class of simple player-2 strategies.

2.2.2 Probability space and outcomes of strategies

A path of the game graph G is an infinite sequence ω = 〈s0, s1, s2, . . .〉 of states in

S such that (sk, sk+1) ∈ E for all k ≥ 0. We denote the set of paths of G by Ω. Once a

starting state s ∈ S and strategies σ ∈ Σ and π ∈ Π for the two players are fixed, the result

of the game is a random walk in G, denoted as ωσ,πs .

Probability space of strategies. Given a finite sequence x = 〈s0, s1, . . . , sk〉 of states,

the cone for x the set Cone(x) = 〈s′0, s′1, . . .〉 | (∀0 ≤ i ≤ k)(si = s′i) of paths with prefix x.

Let U be the set of cones, for all finite paths of G. The set U is the set of basic open sets

in Sω. Let F be the Borel σ-field generated by U , i.e., F is the smallest set that is closed

under complementation, countable union, countable intersection, Ω ∈ F and U ⊆ F . Then

(Ω,F) is a σ-algebra. Given strategies σ and π for player 1 and player 2, respectively, and

a state s, we define a function µσ,πs : U → [0, 1] as follows:

• Cones of length 1:

µσ,πs (Cone(s′)) =

1 if s = s′

0 otherwise

• Cones of length greater than 1: given a finite sequence ωk+1 = 〈s0, s1, . . . , sk, sk+1〉,


let ωk = 〈s0, s1, . . . , sk〉 and

µσ,πs (Cone(ωk+1)) = µσ,π

s (Cone(ωk))·∑

a1∈Γ1(sk),

a2∈Γ2(sk)

δ(sk, a1, a2)(sk+1)·σ(ωk)(a1)·π(ωk)(a2).

The function µσ,πs is a measure and there is a unique extension of µ

σ,πs as a probability

measure on F (by Caratheodary Extension Theorem [Bil95]). We denote this probability

measure on F induced by strategies σ and π, and the starting state s as Prσ,πs . Then

(Ω,F ,Prσ,πs ) is a probability space. An event Φ is a measurable set of paths, i.e., Φ ∈ F .

Given an event Φ, Prσ,πs (Φ) denotes the probability that the random walk ω

σ,πs is in Φ. For

a measurable function f : Ω → R we denote by Eσ,πs [f ] the expectation of the function

f under the probability distribution Prσ,πs (·). For i ≥ 0, we denote by Xi : Ω → S the

random variable denoting the i-th state along a path, and by Y1,i and Y2,i the random

variables denoting the action played in the i-th round of the play by player 1 and player 2,

respectively.

Outcomes of strategies. Consider two strategies σ ∈ Σ and π ∈ Π on a game graph G,

and let ω = 〈s0, s1, s2, . . .〉 be a path of G. The path ω is (σ, π)-possible for a 212 -player

game graph G if for every k ≥ 0 the following two conditions hold: if sk ∈ S1, then

σ(s0s1 . . . sk)(sk+1) > 0; and if sk ∈ S2, then π(s0s1 . . . sk)(sk+1) > 0. The path ω is (σ, π)-

possible for a concurrent game graph G if for every k ≥ 0, there exist moves a1 ∈ Γ1(sk) and

a2 ∈ Γ2(sk) for the two players such that σ(s0s1 . . . sk)(a1) > 0 and π(s0s1 . . . sk)(a2) > 0

and sk+1 ∈ Succ(sk, a1, a2). Given a state s ∈ S and two strategies σ ∈ Σ and π ∈ Π, we

denote by Outcome(s, σ, π) ⊆ Ω the set of (σ, π)-possible paths whose first state is s. Note

that Outcome(s, σ, π) is a probability-1 event, i.e., Prσ,πs (Outcome(s, σ, π)) = 1.

Given a game graph G and a player-1 strategy σ ∈ Σ, we write Gσ for the game

played on G under the constraint that player 1 follows the strategy σ. Analogously, given G


and a player-2 strategy π ∈ Π, we write Gπ for the game played on G under the constraint

that player 2 follows the strategy π. Observe that for a 212 -player game graph G or a

concurrent game graph G and a memoryless player-1 strategy σ ∈ Σ, the result Gσ is a

player-2 MDP. Similarly, for a player-2 MDP G and a memoryless player-2 strategy π ∈ Π,

the result Gπ is a Markov chain. Hence, if G is a 212 -player game graph or a concurrent

game graph and the two players follow memoryless strategies σ and π, then the result

Gσ,π = (Gσ)π is a Markov chain. Also the following observation will be used later. Given a

game graph G and a strategy in Σ∪Π with finite memory M, the strategy can be interpreted

as a memoryless strategy in the synchronous product G×M of the game graph G with the

memory M. Hence the above observation (on memoryless strategies) also extends to finite-

memory strategies, i.e., if player 1 plays a finite-memory strategy σ, then Gσ is a player-2

MDP, and if both players follow finite-memory strategies, then we have a Markov chain.

2.3 Objectives

Consider a game graph G. Player-1 and player-2 objectives for G are measurable

sets Φ1,Φ2 ⊆ Ω of winning paths for the two players: player i, for i ∈ 1, 2, wins the game

played on the graph G with the objective Φi iff the infinite path in Ω that results from

playing the game, lies inside the set Φi. In the case of zero-sum games, the objectives of

the two players are strictly competitive, that is, Φ2 = Ω \ Φ1. A general class of objectives

are the Borel objectives. A Borel objective Φ ⊆ Ω is a Borel set in the Cantor topology

on the set Sω of infinite state sequences (note that Ω ⊆ Sω). An important subclass of

the Borel objectives are the ω-regular objectives, which lie in the first 212 levels of the Borel

hierarchy (i.e., in the intersection of Σ03 and Π0

3). The ω-regular objectives are of special

interest for the verification and synthesis of reactive systems [MP92]. In particular, the

following specifications of winning conditions for the players define ω-regular objectives,


and subclasses thereof [Tho97].

Reachability and safety objectives. A reachability specification for the game graph

G is a set T ⊆ S of states, called target states. The reachability specification T requires

that some state in T be visited. Thus, the reachability specification T defines the set

Reach(T ) = 〈s0, s1, s2, . . .〉 ∈ Ω | (∃k ≥ 0)(sk ∈ T ) of winning paths; this set is called

a reachability objective. A safety specification for G is likewise a set U ⊆ S of states;

they are called safe states. The safety specification U requires that only states in U be

visited. Formally, the safety objective defined by U is the set Safe(U) = 〈s0, s1, . . .〉 ∈ Ω |

(∀k ≥ 0)(sk ∈ U) of winning paths. Note that reachability and safety are dual objectives:

Safe(U) = Ω \ Reach(S \ U).

Buchi and coBuchi objectives. A Buchi specification for G is a set B ⊆ S of states,

which are called Buchi states. The Buchi specification B requires that some state in B be

visited infinitely often. For a path ω = 〈s0, s1, s2, . . .〉, we write Inf(ω) = s ∈ S | sk =

s for infinitely many k ≥ 0 for the set of states that occur infinitely often in ω. Thus, the

Buchi objective defined by B is the set Buchi(B) = ω ∈ Ω | Inf(ω) ∩ B 6= ∅ of winning

paths. The dual of a Buchi specification is a coBuchi specification C ⊆ S, which specifies a

set of so-called coBuchi states. The coBuchi specification C requires that the states outside

C be visited only finitely often. Formally, the coBuchi defined by C is the set coBuchi(C) =

ω ∈ Ω | Inf(ω) ⊆ C of winning paths. Note that coBuchi(C) = Ω \ Buchi(S \ C). It is

also worth noting that reachability and safety objectives can be turned into both Buchi and

coBuchi objectives, by slightly modifying the game graph (for example if every target state

s ∈ T is made a sink state, then we have Reach(T ) = Buchi(T )).

Rabin and Streett objectives. We use colors to define objectives independent of game

graphs. For a set C of colors, we write [[·]]: C → 2S for a function that maps each color to a set

of states. Inversely, given a set U ⊆ S of states, we write [U ] = c ∈ C | [[c]]∩U 6= ∅ for the


set of colors that occur in U . Note that a state can have multiple colors. A Rabin objective

is specified as a set P = (e1, f1), . . . , (ed, fd) of pairs of colors ei, fi ∈ C. Intuitively, the

Rabin condition P requires that for some 1 ≤ i ≤ d, all states of color ei be visited finitely

often and some state of color fi be visited infinitely often. Let [[P ]] = (E1, F1), . . . , (Ed, Fd)

be the corresponding set of so-called Rabin pairs, where Ei = [[ei]] and Fi = [[fi]] for all

1 ≤ i ≤ d. Formally, the set of winning plays is Rabin(P ) = ω ∈ Ω | ∃ 1 ≤ i ≤

d. (Inf(ω) ∩ Ei = ∅ ∧ Inf(ω) ∩ Fi 6= ∅). Without loss of generality, we require that

( ⋃i∈1,2,...,d(Ei ∪ Fi)

)= S. The parity (or Rabin-chain) objectives are the special case

of Rabin objectives such that E1 ⊂ F1 ⊂ E2 ⊂ F2 . . . ⊂ Ed ⊂ Fd. A Streett objective is

again specified as a set P = (e1, f1), . . . , (ed, fd) of pairs of colors. The Streett condition

P requires that for each 1 ≤ i ≤ d, if some state of color fi is visited infinitely often,

then some state of color ei be visited infinitely often. Formally, the set of winning plays is

Streett(P ) = ω ∈ Ω | ∀ 1 ≤ i ≤ d. (Inf(ω) ∩ Ei 6= ∅ ∨ Inf(ω) ∩ Fi = ∅), for the set

[[P ]] = (E1, F1), . . . , (Ed, Fd) of so-called Streett pairs. Note that the Rabin and Streett

objectives are dual; i.e., the complement of a Rabin objective is a Streett objective, and

vice versa.

Parity objectives. A parity specification for G consists of a nonnegative integer d and

a function p: S → 0, 1, 2, . . . , 2d, which assigns to every state of G an integer between

0 and 2d. For a state s ∈ S, the value p(s) is called the priority of S. We assume

without loss of generality that p−1(j) 6= ∅ for all 0 < j ≤ 2d; this implies that a parity

specification is completely specified by the priority function p (and d does not need to be

specified explicitly). The positive integer 2d + 1 is referred to as the number of priori-

ties of p. The parity specification p requires that the minimum priority of all states that

are visited infinitely often, is even. Formally, the parity objective defined by p is the set

Parity(p) = ω ∈ Ω | minp(s) | s ∈ Inf(ω) is even of winning paths. Note that for

a parity objective Parity(p), the complementary objective Ω \ Parity(p) is again a parity


objective: Ω \ Parity(p) = Parity(p + 1), where the priority function p + 1 is defined by

(p+1)(s) = p(s)+1 for all states s ∈ S (if p−1(0) = ∅, then use p−1 instead of p+1). This

self-duality of parity objectives is often convenient when solving games. It is also worth

noting the Buchi objectives are parity objectives with two priorities (let p−1(0) = B),

and the coBuchi objectives are parity objectives with three priorities (let p−1(0) = ∅ and

p−1(1) = S \ C and p−1(2) = C).

Parity objectives are also called Rabin-chain objectives, as they are a special case

of Rabin objectives [Tho97]: if the sets of a Rabin pair P = (E1, F1), . . . , (Ed, Fd) form

a chain E1 ( F1 ( E2 ( F2 ( · · · ( Ed ( Fd, then Rabin(P ) = Parity(p) for the priority

function p: S → 0, 1, . . . , 2d that for all 1 ≤ j ≤ d assigns to each state in Ej \ Fj−1

the priority 2j − 1, and to each state in Fj \ Ej the priority 2j, where F0 = ∅. Conversely,

given a priority function p: S → 0, 1, . . . , 2d, we can construct a chain E1 ( F1 ( · · · (

Ed+1 ( Fd+1 of Rabin sets such that Parity(p) = Rabin((E1, F1), . . . , (Ed, Fd) as follows:

let E1 = ∅ and F1 = p−1(0), and for all 1 ≤ j ≤ d + 1, let and Ej = Fj−1 ∪ p−1(2j − 3)

and Fj = Ej ∪ p−1(2j − 2). Hence, the parity objectives are a subclass of the Rabin

objectives that is closed under complementation. It follows that every parity objective is

both a Rabin objective and a Streett objective. The parity objectives are of special interest,

because every ω-regular objective can be turned into a parity objective by modifying the

game graph (take the synchronous product of the game graph with a deterministic parity

automaton that accepts the ω-regular objective) [Mos84].

Muller and upward-closed objectives. The most general form for defining ω-regular

objectives are Muller specifications. A Muller specification for the game graph G is a set

M ⊆ 2S of sets of states. The sets in M are called Muller sets. The Muller specification

M requires that the set of states that are visited infinitely often is one of the Muller sets.

Formally, the Muller specification M defines the Muller objective Muller(M) = ω ∈ Ω |

Inf(ω) ∈ M. Note that Rabin and Streett objectives are special cases of Muller objectives.


The upward-closed objectives form a sub-class of Muller objectives, with the restriction

that the set M is upward-closed. Formally a set UC ⊆ 2S is upward-closed if the following

condition hold: if U ∈ UC and U ⊆ Z, then Z ∈ UC . Given a upward-closed set UC ⊆ 2S ,

the upward-closed objective is defined as the set UpClo(UC ) = ω ∈ Ω | Inf(ω) ∈ UC of

winning plays.

2.4 Game Values

Given a state s and an objective Ψ1 for player 1, the maximal probability with

which player 1 can ensure that Ψ1 holds from s is the value of the game at s for player 1.

Formally, given a game graph G with objectives Ψ1 for player 1 and Ψ2 for player 2, we

define the value functions Val1 and Val2 for the players 1 and 2, respectively, as follows:

ValG1 (Ψ1)(s) = infπ∈Π

supσ∈Σ

Prσ,πs (Ψ1);

ValG2 (Ψ2)(s) = infσ∈Σ

supπ∈Π

Prσ,πs (Ψ2).

If the game graph G is clear from the context, then we will drop the superscript G. Given a

game graph G, a strategy σ for player 1 and an objective Ψ1 we use the following notation

Valσ1 (Ψ1)(s) = infπ∈Π

Prσ,πs (Ψ1).

Given a game graph G, a strategy σ for player 1 is optimal from state s for objective Ψ1 if

Val1(Ψ1)(s) = Valσ1 (Ψ1)(s) = infπ∈Π

Prσ,πs (Ψ1).

Given a game graph G, a strategy σ for player 1 is ε-optimal, for ε ≥ 0, from state s for

objective Ψ1 if

Val1(Ψ1)(s) − ε ≤ infπ∈Π

Prσ,πs (Ψ1).

Note that an optimal strategy is ε-optimal with ε = 0. The optimal and ε-optimal strategies

for player 2 are defined analogously. Computing values, optimal and ε-optimal strategies is

referred to as the quantitative analysis of games.


Sure, almost-sure, positive and limit-sure winning strategies. Given a game graph

G with an objective Ψ1 for player 1, a strategy σ is a sure winning strategy for player 1

from a state s if for every strategy π of player 2 we have Outcome(s, σ, π) ⊆ Ψ1. A strategy

σ is an almost-sure winning strategy for player 1 from a state s for the objective Ψ1 if

for every strategy π of player 2 we have Prσ,πs (Ψ1) = 1. A strategy σ is positive winning

for player 1 from the state s for the objective Φ if for every player-2 strategy π, we have

Prσ,πs (Φ) > 0. A family of strategies ΣC is limit-sure winning for player 1 from a state s for

the objective Ψ1, if we have supσ∈ΣC infπ∈Π Prσ,πs (Ψ1)(s) = 1. The sure winning, almost-

sure winning, positive winning and limit-sure winning strategies for player 2 are defined

analogously. Given a game graph G and an objective Ψ1 for player 1, the sure winning set

SureG1 (Ψ1) for player 1 is the set of states from which player 1 has a sure winning strategy.

Similarly, the almost-sure winning set AlmostG1 (Ψ1) for player 1 is the set of states from

which player 1 has an almost-sure winning strategy, the positive winning set PositiveG1 (Ψ1)

for player 1 is the set of states from which player 1 has an almost-sure winning strategy,

and the limit-sure winning set LimitG1 (Ψ1) for player 1 is the set of states from which

player 1 has limit-sure winning strategies. The sure winning set SureG2 (Ψ2), the almost-

sure winning set AlmostG2 (Ψ2), the positive winning set PositiveG

2 (Ψ2) and the limit-sure

winning set LimitG2 (Ψ2) with objective Ψ2 for player 2 are defined analogously. Again if the

game graph G is clear from the context we will drop G from superscript. It follows from

the definitions that for all 212 -player and concurrent game graphs and all objectives Ψ1 and

Ψ2, we have Sure1(Ψ1) ⊆ Almost1(Ψ1) ⊆ Limit1(Ψ1) ⊆ Positive1(Ψ1) and Sure2(Ψ2) ⊆

Almost2(Ψ2) ⊆ Limit2(Ψ2) ⊆ Positive2(Ψ2). A game is sure winning (resp. almost-sure

winning and limit-sure winning) for player i, if every state is sure winning (resp. almost-

sure winning and limit-sure winning) for player i, for i ∈ 1, 2. Computing sure winning,

almost-sure winning, positive winning and limit-sure winning sets and strategies is referred

to as the qualitative analysis of games.


Sufficiency of a family of strategies. Let C ∈ P,M,F,PM ,PF and consider the

family ΣC of special strategies for player 1. We say that the family ΣC suffices with respect

to an objective Ψ1 on a class G of game graphs for

• sure winning if for every game graph G ∈ G, for every s ∈ Sure1(Ψ1) there is

a player-1 strategy σ ∈ ΣC such that for every player-2 strategy π ∈ Π we have

Outcome(s, σ, π) ⊆ Ψ1;

• almost-sure winning if for every game graph G ∈ G, for every state s ∈ Almost1(Ψ1)

there is a player-1 strategy σ ∈ ΣC such that for every player-2 strategy π ∈ Π we

have Prσ,πs (Ψ1) = 1;

• positive winning if for every game graph G ∈ G, for every state s ∈ Positive1(Ψ1)

there is a player-1 strategy σ ∈ ΣC such that for every player-2 strategy π ∈ Π we

have Prσ,πs (Ψ1) > 0;

• limit-sure winning if for every game graph G ∈ G, for every state s ∈ Limit1(Ψ1) we

have supσ∈ΣC infπ∈Π Prσ,πs (Ψ1) = 1;

• optimality if for every game graph G ∈ G, for every state s ∈ S there is a player-1

strategy σ ∈ ΣC such that Val1(Ψ1)(s) = infπ∈Π Prσ,πs (Ψ1).

• ε-optimality if for every game graph G ∈ G, for every state s ∈ S there is a player-1

strategy σ ∈ ΣC such that Val1(Ψ1)(s) − ε ≤ infπ∈Π Prσ,πs (Ψ1).

The notion of sufficiency for size of finite-memory strategies is obtained by referring to the

size of the memory M of the strategies. The notions of sufficiency of strategies for player 2

is defined analogously.

For sure winning, 112 -player and 21

2 -player games coincide with 2-player (turn-

based deterministic) games where the random player (who chooses the successor at the


probabilistic states) is interpreted as an adversary, i.e., as player 2. This is formalized by

the proposition below.

Proposition 1 If a family ΣC of strategies suffices for sure winning with respect to an

objective Φ on all 2-player game graphs, then the family ΣC suffices for sure winning with

respect to Φ also on all 112-player and 21

2 -player game graphs.

The following proposition states that randomization is not necessary for sure win-

ning.

Proposition 2 If a family ΣC of strategies suffices for sure winning with respect to a Borel

objective Φ on all concurrent game graphs, then the family ΣC∩ΣP of pure strategies suffices

for sure winning with respect to Φ on all concurrent game graphs.

2.5 Determinacy

The fundamental concept of rationality in zero-sum games is captured by the

notion of optimal and ε-optimal strategies. The key result that establishes existence of

ε-optimal strategies, for all ε > 0, in zero-sum games is the determinacy result, that states

the sum of the values of the players is 1 at all states, i.e., for all states s ∈ S, we have

Val1(Ψ1)(s) + Val2(Ψ2)(s) = 1. The determinacy result implies the following equality:

supσ∈Σ

infπ∈Π

Prσ,πs (Ψ1) = inf

π∈Πsupσ∈Σ

Prσ,πs (Ψ1).

The determinacy result also guarantees existence of ε-optimal strategies for all ε > 0,

for both players. A deep result by Martin [Mar98] established that determinacy holds

for all concurrent games with Borel objectives (see Theorem 1). A more refined notion

of determinacy is the sure determinacy which states that for an objective Ψ1 we have

Sure1(Ψ1) = (S \ Sure2(Ω \ Ψ1)). The sure determinacy holds for turn-based deterministic


games with all Borel objectives [Mar75], however, the sure determinacy does not hold for

212 -player games and concurrent games.

Theorem 1 For all Borel objectives Ψ1 and the complementary objective Ψ2 = Ω \ Ψ1 the

following assertions hold.

1. ([Mar75]). For all 2-player game graphs, the sure winning sets Sure1(Ψ1) and

Sure2(Ψ2) form a partition of the state space, i.e., Sure1(Ψ1) = S \ Sure2(Ψ2), and

the family of pure strategies suffices for sure winning.

2. ([Mar98]). For all concurrent game structures and for all states s we have

Val1(Ψ1)(s) + Val2(Ψ2)(s) = 1.

Given a game graph G, let us denote by Σε(Ψ1) and Πε(Ψ2) the set of ε-optimal

strategies for player 1 for objective Ψ1 and and player 2 for objective Ψ2, respectively. We

obtain the following corollary from Theorem 1.

Corollary 1 For all concurrent game structures, for all Borel objectives Ψ1 and the com-

plementary objective Ψ2, for all ε > 0 we have Σε(Ψ1) 6= ∅ and Πε(Ψ2) 6= ∅.

2.6 Complexity of Games

We now summarize the main complexity results related to 2-player, 212 -player and

concurrent games with parity, Rabin, Streett and Muller objectives. We first present the

result for 2-player games.

Theorem 2 (Complexity of 2-player games) The problem of deciding whether a state

s is a sure winning state, i.e., s ∈ Sure1(Ψ1) for an objective Ψ1, is NP-complete for

Rabin objectives and coNP-complete for Streett objectives [EJ88], and PSPACE-complete

for Muller objectives [HD05].


We now state the main complexity results known for concurrent game struc-

tures. The basic results were proved for reachability and parity objectives (given by The-

orem 3). By an exponential reduction of Rabin, Streett and Muller objectives to parity

objectives [Tho97], we obtain Corollary 2.

Theorem 3 The following assertions hold.

1. ([dAHK98]). For all concurrent game structures G, for all T ⊆ S, for a state s ∈ S,

whether Val1(Reach(T ))(s) = 1 can be decided in PTIME.

2. ([EY06]). For all concurrent game structures G, for all T ⊆ S, for a state s ∈ S,

a rational α and a rational ε > 0, whether Val1(Reach(T ))(s) ≥ α can be decided

in PSPACE; and a rational interval [l, u] such that Val1(Reach(T ))(s) ∈ [l, u] and

u − l ≤ ε can be computed in PSPACE.

3. ([dAH00]). For all concurrent game structures G, for all priority functions p, for a

state s ∈ S, whether Val1(Parity(p))(s) = 1 can be decided in NP ∩ coNP.

4. ([dAM01]). For all concurrent game structures G, for all priority functions p, for a

state s ∈ S, a rational α and a rational ε > 0, whether Val1(Parity(p))(s) ≥ α can be

decided in 3EXPTIME; and a rational interval [l, u] such that Val1(Parity(p))(s) ∈

[l, u] and u − l ≤ ε can be computed in 3EXPTIME.

Corollary 2 For all concurrent game structures G, for all Rabin, Streett, and Muller ob-

jectives Φ, for a state s ∈ S, a rational α and a rational ε > 0, whether Val1(Φ)(s) ≥ α

can be decided in 4EXPTIME; a rational interval [l, u] such that Val1(Φ)(s) ∈ [l, u] and

u − l ≤ ε can be computed in 4EXPTIME; and whether Val1(Φ)(s) = 1 can be decided in

2EXPTIME.

We now present the result for 212 -player game graphs. The results for 21

2 -player

game graphs are obtained as follows: the qualitative analysis for reachability and parity


objectives follows from the results on concurrent game structures; the result for quantitative

analysis for reachability objectives follows from the results of Condon [Con92]; and the result

for quantitative analysis for parity objectives follows from the results of concurrent games

but with an exponential improvement. The results are presented in Theorem 4, and the

exponential reduction of Rabin, Streett and Muller objectives to parity objectives [Tho97]

yields Corollary 3.

Theorem 4 The following assertions hold.

1. ([dAHK98]). For all 212 -player game graphs G, for all T ⊆ S, for a state s ∈ S,

whether Val1(Reach(T ))(s) = 1 can be decided in PTIME.

2. ([Con92]). For all 212 -player game graphs G, for all T ⊆ S, for a state s ∈ S,

and a rational α, whether Val1(Reach(T ))(s) ≥ α can be decided in NP ∩ coNP, and

Val1(Reach(T ))(s) can be computed in EXPTIME.

3. ([dAH00]). For all 212-player game graphs G, for all priority functions p, for a state

s ∈ S, whether Val1(Parity(p))(s) = 1 can be decided in NP ∩ coNP.

4. ([dAM01]). For all 212-player game structures G, for all priority functions p, for a

state s ∈ S, a rational α, and a rational ε > 0, whether Val1(Parity(p))(s) ≥ α can

be decided in 2EXPTIME; and a rational interval [l, u] such that Val1(Parity(p))(s) ∈

[l, u] and u − l ≤ ε can be computed in 2EXPTIME.

Corollary 3 For all 212 -player game graphs G, for all Rabin, Streett, and Muller objectives

Φ, for a state s ∈ S, a rational α and a rational ε > 0, whether Val1(Φ)(s) ≥ α can be

decided in 3EXPTIME; a rational interval [l, u] such that Val1(Φ)(s) ∈ [l, u] and u − l ≤ ε

can be computed in 3EXPTIME; and whether Val1(Φ)(s) = 1 can be decided in 2EXPTIME.

30

Chapter 3

Concurrent Games with Tail

Objectives

In this chapter we will consider concurrent games with tail objectives,1 i.e., ob-

jectives that are independent of the finite-prefix of traces, and show that the class of tail

objectives are strictly richer than the ω-regular objectives. We develop new proof techniques

to extend several properties of concurrent games with ω-regular objectives to concurrent

games with tail objectives. We prove the positive limit-one property for tail objectives,

that states for all concurrent games if the optimum value for a player is positive for a tail

objective Φ at some state, then there is a state where the optimum value is 1 for Φ, for the

player. We also show that the optimum values of zero-sum (strictly conflicting objectives)

games with tail objectives can be related to equilibrium values of nonzero-sum (not strictly

conflicting objectives) games with simpler reachability objectives. A consequence of our

analysis presents a polynomial time reduction of the quantitative analysis of tail objectives

to the qualitative analysis for the sub-class of one-player stochastic games (Markov decision

processes). The properties we prove for the general class of concurrent games with tail

1A preliminary version of the results of this chapter appeared in [Cha06, Cha07a]

CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES 31

objectives will be used in the later chapters for both concurrent and turn-based games with

Muller objectives.

3.1 Tail Objectives

The class of tail objectives are defined as follows.

Tail objectives. Informally the class of tail objectives is the sub-class of Borel objectives

that are independent of all finite prefixes. An objective Φ is a tail objective, if the following

condition hold: a path ω ∈ Φ if and only if for all i ≥ 0, ωi ∈ Φ, where ωi denotes the

path ω with the prefix of length i deleted. Formally, let Gi = σ(Xi,Xi+1, . . .) be the σ-

field generated by the random-variables Xi,Xi+1, . . ..2. The tail σ-field T is defined as

T =⋂

i≥0 Gi. An objective Φ is a tail objective if and only if Φ belongs to the tail σ-field

T , i.e., the tail objectives are indicator functions of events A ∈ T .

Observe that Muller and parity objectives are tail objectives. Buchi and coBuchi

objectives are special cases of parity objectives and hence tail objectives. Reachability

objectives are not necessarily tail objectives, but for a set T ⊆ S of states, if every state

s ∈ T is an absorbing state, then the objective Reach(T ) is equivalent to Buchi(T ) and hence

is a tail objective. It may be noted that since σ-fields are closed under complementation,

the class of tail objectives are closed under complementation. We give an example to show

that the class of tail objectives are richer than ω-regular objectives. 3

Example 1 Let r be a reward function that maps every state s to a real-valued reward

r(s), i.e., r : S → R. Given a reward function r, we define a function LimAvgr : Ω → R

as follows: for a path ω = 〈s1, s2, s3, . . .〉 we have LimAvgr(ω) = lim infn→∞

1

n

n∑

i=1

r(si) i.e.,

LimAvgr(ω) defines the long-run average of the rewards appearing in ω. For a constant

2We use σ for strategies and σ (boldface) for sigma-fields3Our example shows that there are Π

03-hard objectives that are tail objectives. It is possible that the

tail objectives can express objectives in even higher levels of Borel hierarchy than Π03, which will make our

results stronger.


c ∈ R consider the objective Φc defined as follows: Φc = ω ∈ Ω | LimAvgr(ω) ≥ c.

Intuitively, Φc accepts the set of paths such that the “long-run” average of the rewards in

the path is at least the constant c. The “long-run” average condition is hard for the third-

level of the Borel-hierarchy (see subsection 3.1.1 for Π03-completeness proof) and cannot be

expressed as an ω-regular objective. It may be noted that the “long-run” average of a path

is independent of all finite-prefixes of the path. Formally, the class Φc of objectives are tail

objectives. Since Φc are Π03-hard objectives, it follows that tail objectives lie in higher levels

of Borel hierarchy than ω-regular objectives.

Notation. For ε > 0, an objective Φ for player 1 and Φ for player 2, we denote by Σε(Φ)

and Πε(Φ) the set of ε-optimal strategies for player 1 and player 2, respectively. Note that

the quantitative determinacy of concurrent games equivalent to the existence of ε-optimal

strategies for objective Φ for player 1 and Φ for player 2, for all ε > 0, at all states s ∈ S,

i.e., for all ε > 0, Σε(Φ) 6= ∅ and Πε(Φ) 6= ∅ (Corollary 1). We refer to the analysis of

computing the limit-sure winning states (the set of states s such that Val1(Φ)(s) = 1) and

ε-limit-sure winning strategies (ε-optimal strategies for the limit-sure winning states) as the

qualitative analysis of objective Φ. We refer to the analysis of computing the values and

the ε-optimal strategies as the quantitative analysis of objective Φ.

3.1.1 Completeness of limit-average objectives

Borel hierarchy. For an (possibly infinite) alphabet A, let Aω and A∗ denote

the set of infinite and finite words on A, respectively. The finite Borel hierarchy

(Σ01,Π

01),(Σ

02,Π

02),(Σ

03,Π

03), . . . is defined as follows:

• Σ01 = W · Aω | W ⊆ A∗ is the set of open sets;

• for all n ≥ 1, Π0n = Aω \ L | L ∈ Σ0

n consists of the complement of sets in Σ0n;


• for all n ≥ 1, Σ0n+1 = ⋃i∈N

Li | ∀i ∈ N. Li ∈ Π0n is the set obtained by countable

union of sets in Π0n.

Definition 1 (Wadge game) Let A and B be two (possibly infinite) alphabets. Let X ⊆

Aω and Y ⊆ Bω. The Wadge game GW (X,Y ) is a two player game between player 1

and player 2 as follows. Player 1 first chooses a letter a0 ∈ A and then player 2 chooses a

(possibly empty) finite word b0 ∈ B∗, then player 1 chooses a letter a1 ∈ A and then player 2

chooses a word b1 ∈ B∗, and so on. The play consists in writing a word wX = a0a1 . . . by

player 1 and wY = b0b1 . . . by player 2. Player 2 wins if and only if both wY is infinite and

wX ∈ X iff wY ∈ Y .

Definition 2 (Wadge reduction) Given alphabets A and B, a set X ⊆ Aω is Wadge

reducible to a set Y ⊆ Bω, denoted as X ≤W Y , if and only if there exists a continuous

function f : Aω → Bω such that X = f−1(Y ). If X ≤W Y and Y ≤W X, then X and Y

are Wadge equivalent and we denote this by X ≡W Y .

The notion of strategies in Wadge games and winners is defined similarly to the

notion of games on graphs. The Wadge games and Wadge reduction are related by the

following result.

Proposition 3 ([Wad84]) Player 2 has a winning strategy in the Wadge game GW (X,Y )

iff X ≤W Y .

Wadge equivalence preserves Borel hierarchy and defines the natural notion of

completeness.

Proposition 4 If X ≡W Y , then X and Y belong to the same level of Borel hierarchy.

Definition 3 A set Y ∈ Σ0n (resp. Y ∈ Π0

n) is Σ0n-complete (resp. Π0

n-complete) if and

only if X ≤W Y for all X ∈ Σ0n (resp. X ∈ Π0

n).


Our goal is to show that the lim inf objectives (defined in Example 1) are Π03-hard.

We first present a few notations.

Notations. Let A be an alphabet and B = b0, b1. For a word w ∈ A∗ or w ∈ B∗ we

denote by len(w) the length of w. For an infinite word w or finite word w with len(w) ≥ k

we denote by (w k) the prefix of length k of w. For a word w ∈ Bω or w ∈ B∗ with

len(w) ≥ k, we denote by

avg(w k

)=

number of b0 in (w k)

k,

i.e., the average of b0’s in (w k). For a finite word w we denote by avg(w) = avg(w

len(w)). Let

Y = w ∈ Bω | lim infk→∞

avg(w k) = 1

=⋂

i≥0

⋃

j≥0

⋂

k≥j

w ∈ Bω | avg(w k) ≥ 1 − 1

i.

Hardness of Y . We will show that Y is Π03-hard. To prove the result we consider an

arbitrary X ∈ Π03 and show that X ≤W Y . A set X ⊆ Aω in Π0

3 is obtained as the

countable intersection of countable union of closed sets, i.e.,

X =⋂

i≥0

⋃

j≥0

(Aj · (Fij)ω),

where Fij ⊆ A, and Aj denotes the set of words of length j in A∗. We show such a X is

Wadge reducible to Y , by showing that player 2 has a winning strategy in GW (X,Y ). In

the reduction we will use the following notation: given a word w ∈ A∗, let

sat(w) = i | ∃j ≥ 0. w ∈ Aj · (Fij)∗;

d(w) = maxl | ∀l′ ≤ l. l′ ∈ sat(w) + 1.

For example if sat(w) = 0, 1, 2, 4, 6, 7, then d(w) = max0, 1, 2+1 = 3. The play between

player 1 and player 2 proceeds as follows:

Player 1: wX = a1 a2 a3 . . . ; ∀i ≥ 1. ai ∈ A

Player 2: wY = wY (1) wY (2) wY (3) . . . ; ∀i ≥ 1. wY (i) ∈ B+


A winning strategy for player 2 is as follows: let the current prefix of wX of length k be

(wX k) = a1a2 . . . ak and the current prefix of wY be wY (1)wY (2) . . . wY (k − 1), then the

word wY (k) is generated satisfying the following conditions:

1. There exists ℓ ≤ len(wY (k)) such that

avg

(wY (1)wY (2) . . . wY (k − 1)(wY (k) ℓ)

)≥ 1 − 1

d(wX k),

for all ℓ1 ≤ ℓ

avg

(wY (1) . . . wY (k − 1)(wY (k) ℓ)

)≥ avg

(wY (1) . . . wY (k − 1)(wY (k) ℓ1)

)

and for all ℓ2 such that ℓ ≤ ℓ2 ≤ len(wY (k)) we have

avg

(wY (1)wY (2) . . . wY (k − 1)(wY (k) ℓ2)

)≥ 1 − 1

d(wX k)

2.

1 − 1

d(wX k)≤ avg

(wY (1)wY (2) . . . wY (k)

)≤ 1 − 1

d(wX k) + 1

Intuitively, player 2 initially plays a sequence of b0’s to ensure that the average of b0’s crosses(1 − 1

d(wXk)

)and then plays a sequence of b0 and b1’s to ensure that the average of b0 in

wY (1)wY (2) . . . wY (k) is in the interval

[1 − 1

d(wX k), 1 − 1

d(wX k) + 1

],

and the average never falls below

(1− 1

d(wXk)

)while generating wY (k) once it crosses

(1−

1d(wXk)

). Clearly, player 2 has such a strategy. Given a word wX ∈ Aω, the corresponding

word wY generated is an infinite word. Hence we need to prove wX ∈ X if and only if

wY ∈ Y . We prove implications in both directions.

Claim 1. (wX ∈ X ⇒ wY ∈ Y ) Let wX ∈ X and we show that wY ∈ Y . Given wX ∈ X,

we have ∀i ≥ 0. ∃j ≥ 0. wX ∈ Aj · (Fij)ω. Given i ≥ 0, let

j(i) = minj ≥ 0 | wX ∈ Aj · (Fij)ω; j(i) = maxj(i′) | i′ ≤ i.


Given i ≥ 0, for j = j(i), for all k ≥ j we have (wX k) ∈ Aj · (Fij)∗. Consider the

sequence (wX j), (wX j + 1), . . .: for all k ≥ j we have i′ | i′ ≤ i ⊆ sat(wX

k). Hence in the corresponding sequence of the word wY it is ensured that for all ℓ ≥

len(wY (1)wY (2) . . . wY (j)) we have avg(wY ℓ) ≥ 1− 1i+1 . Hence lim infn→∞ avg(wY n) ≥

1 − 1i+1 . Since this holds for all i ≥ 0, let i → ∞ to obtain that lim infn→∞ avg(wY n) ≥

1 = 1 (the equality follows as the average can never be more than 1). Hence wY ∈ Y .

Claim 2. (wY ∈ Y ⇒ wX ∈ X) Let wY ∈ Y and we show wX ∈ X. Fix i ≥ 0. Since

lim infn→∞ avg(wY n) = 1, it follows that from some point on average never falls below

1 − 1i+1 . Then there exists j such that for all l ≥ j we have d(wX l) ≥ i + 1 and hence

i′ | i′ ≤ i ⊆ sat(wY l). Hence for all l ≥ j we have (wX l) ∈ Aj · (Fij)∗ and thus we

obtain that wX ∈ Aj · (Fij)ω, i.e., ∃j ≥ 0 such that wX ∈ Aj · (Fij)

ω. Since this holds for

all i ≥ 0, it follows that wX ∈ X.

From claim 1 and claim 2 it follows that Y is Π03-hard, and as an easy consequence

we have the class Φc of objectives defined in Example 1 is Π03-hard. Hence tail objectives

contain Π03-hard objectives and since tail objectives are closed under complementation it

also follows that tail objectives contain Σ03-hard objectives.

Π03-completeness. To prove Π0

3-completeness for long-run average objectives, now it suf-

fices to show that long-run average objectives can be expressed in Π03. To achieve this we

need to show that for a reward function r, for all real β how to express the following sets

in Π03:

(1) ω ∈ Ω | LimAvgr(ω) ≤ β; (2) ω ∈ Ω | LimAvgr(ω) ≥ β.

We now show how to express the above sets in Π03. We prove the two cases below.

1. The expression for (1) is as follows:

ω ∈ Ω | LimAvgr(ω) ≤ β =⋂

m≥1

⋃

n≥m

ω | ω = 〈s1, s2, s3, . . .〉,1

n

n∑

i=1

r(si) ≤ β


It is easy to argue that for a fixed n the set ω | ω = 〈s1, s2, s3, . . .〉,1

n

n∑

i=1

r(si) ≤ β

is an open set (Σ01 set) (the set of paths can be expressed as union of cones). Hence

it follows that the set of paths specified by (1) can be expressed in Π02.

2. The expression for (2) is as follows:

ω ∈ Ω | LimAvgr(ω) ≥ β =⋂

m≥1

⋃

n≥1

⋂

k≥n

ω | ω = 〈s1, s2, s3, . . .〉,

1

k

k∑

i=1

r(si) ≥ β− 1

m

To prove the above expression is in Π03 we show that for fixed m and k,

ω | ω = 〈s1, s2, s3, . . .〉,

1

k

k∑

i=1

r(si) ≥ β − 1

m

(∗)

is a closed set (i.e, in Π01). Observe that once we prove this, it follows that the set

⋂

k≥n

ω | ω = 〈s1, s2, s3, . . .〉,

1

n

n∑

i=1

r(si) ≥ β − 1

m

is also a closed set, and hence (2) can be expressed in Π03. To prove the desired claim

we show that the complement of (*) is open, i,e., for fixed k and m we argue that the

set ω | ω = 〈s1, s2, s3, . . .〉,

1

k

k∑

i=1

r(si) < β − 1

m

is an open set. Observe that the above set can be described by an union of cones of

length k, and since cones are basic open sets the desired result follows.

3.2 Positive Limit-one Property

The positive limit-one property for concurrent games, for a class C of objectives,

states that for all objectives Φ ∈ C, for all concurrent games G, if there is a state s such

that the value for player 1 is positive at s for objective Φ, then there is a state s′ where

the value for player 1 is 1 for objective Φ. The property means if a player can win with

positive value from some state, then from some state she can win with value 1. The positive


s1s012

12

Figure 3.1: A simple Markov chain.

limit-one property was proved for parity objectives in [dAH00] and has been one of the

key properties used in the algorithmic analysis of concurrent games with parity objectives

(see chapter 8). In this section we prove the positive limit-one property for concurrent

games with tail objectives, and thereby extend the positive limit-one property from parity

objectives to a richer class of objectives that subsume several canonical ω-regular objectives.

Our proof uses a result from measure theory and certain strategy constructions, whereas

the proof for the sub-class of parity objectives [dAH00] followed from complementation

arguments of quantitative µ-calculus formula. We first show an example that the positive

limit-one property is not true for all objectives, even for simpler class of games.

Example 2 Consider the game shown in Fig 3.1, where at every state s, we have Γ1(s) =

Γ2(s) = 1 (i.e., the set of moves is singleton at all states). From all states the next state

is s0 and s1 with equal probability. Consider the objective ©(s1) which specifies the next

state is s1; i.e., a play ω starting from state s is winning if the first state of the play is s and

the second state (or the next state from s) in the play is s1. Given the objective Φ = ©(s1)

for player 1, we have Val1(Φ)(s0) = Val1(Φ)(s1) = 12 . Hence though the value is positive at

s0, there is no state with value 1 for player 1.

Notation. In the setting of concurrent games the natural filtration sequence (Fn) for the

stochastic process under any pair of strategies is defined as

Fn = σ(X1,X2, . . . ,Xn)

i.e., the σ-field generated by the random-variables X1,X2, . . . ,Xn.


Conditional expectations. Given a σ-algebra H, the conditional expectation E[f | H] of

a measurable function f is a random variable Z that satisfies the following properties: (a) Z

is H measurable and (b) for all A ∈ H we have E[f1A] = E[Z1A], where 1A is the indicator

of event A (see [Dur95] for details). Another key property of conditional expectation is as

follows: E[E[f | H]] = E[f ] (again see [Dur95] for details).

Almost-sure convergence. Given a random variable X and a sequence (Xn)n≥0 of ran-

dom variables we write Xn → X almost-surely if limn→∞ Pr(ω | Xn(ω) = X(ω)) = 1,

i.e., with probability 1 the sequence converges to X.

Lemma 1 (Levy’s 0-1 law) Suppose Hn ↑ H∞, i.e.,Hn is a sequence of increasing σ-

fields and H∞ = σ(∪nHn). For all events A ∈ H∞ we have

E[1A | Hn] = Pr(A | Hn) → 1A almost-surely, (i.e., with probability 1),

where 1A is the indicator function of event A.

The proof of the lemma is available in the book of Durrett (page 262—263) [Dur95]. An

immediate consequence of Lemma 1 in the setting of concurrent games is the following

lemma.

Lemma 2 (0-1 law in concurrent games) For all concurrent game structures G, for

all events A ∈ F∞ = σ(∪nFn), for all strategies (σ, π) ∈ Σ × Π, for all states s ∈ S, we

have

Prσ,πs (A | Fn) → 1A almost-surely.

Intuitively, the lemma means that the probability Prσ,πs (A | Fn) converges almost-surely

(i.e., with probability 1) to 0 or 1 (since indicator functions take values in the range 0, 1).

Note that the tail σ-field T is a subset of F∞, i.e., T ⊆ F∞, and hence the result of

Lemma 2 holds for all A ∈ T .


Objectives as indicator functions. Objectives Φ are indicator functions Φ : Ω → 0, 1

defined as follows:

Φ(ω) =

1 if ω ∈ Φ

0 otherwise.

Notation. Given strategies σ and π for player 1 and player 2, a tail objective Φ, and a

state s, for β > 0, let

H1,βn (σ, π,Φ) = 〈s1, s2, . . . , sn, sn+1, . . .〉 | Prσ,π

s (Φ | 〈s1, s2, . . . , sn〉) ≥ 1 − β

= ω | Prσ,πs (Φ | Fn)(ω) ≥ 1 − β;

denote the set of paths ω such that the probability of satisfying Φ given the strategies σ

and π, and the prefix of length n of ω is at least 1 − β. Similarly, let

H0,βn (σ, π,Φ) = 〈s1, s2, . . . , sn, sn+1, . . .〉 | Prσ,π

s (Φ | 〈s1, s2, . . . , sn〉) ≤ β

= ω | Prσ,πs (Φ | Fn)(ω) ≤ β;

denote the set of paths ω such that the probability of satisfying Φ given the strategies σ

and π, and the prefix of length n of ω is at most β. We often refer to prefixes of paths in

H1,βn as histories in H

1,βn , and analogously for H

0,βn .

Proposition 5 For all concurrent game structures G, for all strategies σ and π for player 1

and player 2, respectively, for all tail objectives Φ, for all states s ∈ S, for all β > 0 and

ε > 0, there exists n, such that Prσ,πs (H1,β

n (σ, π,Φ) ∪ H0,βn (σ, π,Φ)) ≥ 1 − ε.

Proof. Let fn = Prσ,πs (Φ | Fn). By Lemma 2, we have fn → Φ almost-surely as n → ∞.

Since almost-sure convergence implies convergence in probability we have

∀β > 0. limn→∞ Prσ,πs (ω | |fn(ω) − Φ(ω)| ≥ β) = 0

⇒ ∀β > 0. limn→∞ Prσ,πs (ω | |fn(ω) − Φ(ω)| ≤ β) = 1.


Since Φ is an indicator function we have

∀β > 0. limn→∞ Prσ,πs (ω | fn(ω) ≥ 1 − β or fn(ω) ≤ β) = 1

⇒ ∀β > 0. limn→∞ Prσ,πs (H1,β

n (σ, π,Φ) ∪ H0,βn (σ, π,Φ)) = 1.

Hence we have

∀β > 0. ∀ε > 0. ∃n0. ∀n ≥ n0. Prσ,πs (H1,β

n (σ, π,Φ) ∪ H0,βn (σ, π,Φ)) ≥ 1 − ε.

The result follows.

Lemma 3 (Always-positive implies probability 1) Let α > 0 be a real constant

greater than 0. For all objectives Φ, for all strategies σ and π, and for all states s, if

fn = Prσ,πs (Φ | Fn) > α, ∀n, i.e.,fn(ω) > α almost-surely for all n;

then Prσ,πs (Φ) = 1.

Proof. We show that for all ε > 0 we have Prσ,πs (Φ) ≥ 1− 2ε. Since ε > 0 is arbitrary, the

result follows. Given ε > 0 and α > 0, we chose β such that 0 < β < α and 0 < β < ε. By

Proposition 5 there exists n0 such that for all n > n0 we have

Prσ,πs (ω | fn(ω) ≥ 1 − β or fn(ω) ≤ β) ≥ 1 − ε.

Since fn(ω) ≥ α > β almost-surely for all n we have Prσ,πs (ω | fn(ω) ≥ 1 − β) ≥ 1 − ε,

i.e., we have Prσ,πs (Φ | Fn) ≥ 1 − β with probability at least 1 − ε. Hence we have

Prσ,πs (Φ) = Eσ,π

s [Φ] = Eσ,πs [ Eσ,π

s [Φ | Fn] ] ≥ (1 − β) · (1 − ε) ≥ 1 − 2ε.

Observe that we have used the property of conditional expectation to infer that Eσ,πs [Φ] =

Eσ,πs [ E

σ,πs [Φ | Fn] ]. The desired result follows.

Theorem 5 (Positive limit-one property) For all concurrent game structures G, for

all tail objectives Φ, if there exists a state s ∈ S such that Val1(Φ)(s) > 0, then there exists

a state s′ ∈ S such that Val1(Φ)(s′) = 1.


Probability less than ε.

≤ β

Modify strategy

≥ 1 − β

to eπand ensure η − ε

4

Figure 3.2: An illustration of idea of Theorem 5.

The basic idea of the proof. We prove the desired result by contradiction. We assume

towards contradiction that from some state s we have Val1(Φ)(s) = α > 0 and for all

states s1 we have Val1(Φ)(s1) ≤ η < 1. We fix ε-optimal strategies σ and π for player 1

and player 2, for sufficiently small ε > 0. By Proposition 5, for all 0 < β < 1, there exists

n such that Prσ,πs (H1,β

n ∪ H0,βn ) ≥ 1 − ε

4 . The strategy π is modified to a strategy π as

follows: on histories in H0,βn , the strategy π ignores the history of length n and switches to

an ε4 -optimal strategy, and otherwise plays as π. By suitable choice of β (depending on ε)

we show that player 2 can ensure that the probability of satisfying Φ from s given σ is less

than α− ε. This contradicts that σ is an ε-optimal strategy and Val1(Φ)(s) = α. The idea

is illustrated in Fig 3.2. We formally prove the result now.

Proof. (of Theorem 5.) Assume towards contradiction that there exists a state

s such that Val1(Φ)(s) > 0, but for all states s′ we have Val1(Φ)(s′) < 1. Let

α = 1 − Val1(Φ)(s) = Val2(Φ)(s). Since 0 < Val1(Φ)(s) < 1, we have 0 < α < 1. Since

Val2(Φ)(s′) = 1 − Val1(Φ)(s′) and for all states s′ we have Val1(Φ)(s′) < 1, it follows

that Val2(Φ)(s′) > 0, for all states s′. Fix η such that 0 < η = mins′∈S Val2(Φ)(s′). Also

observe that since Val2(Φ)(s) = α < 1, we have η < 1. Let c be a constant such that c > 0,

and α · (1 + c) = γ < 1 (such a constant exists as α < 1). Also let c1 > 1 be a constant


such that c1 · γ < 1 (such a constant exists since γ < 1); hence we have 1 − c1 · γ > 0 and

1 − 1c1

> 0. Fix ε > 0 and β > 0 such that

0 < 2ε < minη

4, 2c · α,

η

4· (1 − c1 · γ); β < minε, 1

2, 1 − 1

c1. (3.1)

Fix ε-optimal strategies σε for player 1 and πε for player 2. Let H1,βn = H

1,βn (σε, πε,Φ) and

H0,βn = H

0,βn (σε, πε,Φ). Consider n such that Prσε,πε

s (H1,βn ∪ H

0,βn ) ≥ 1 − ε

4 (such n exists

by Proposition 5). Also observe that since β < 12 we have H

1,βn ∩ H

0,βn = ∅. Let

val = Prσε,πεs (Φ | H1,β

n ) · Prσε,πεs (H1,β

n ) + Prσε,πεs (Φ | H0,β


n ).

We have

val ≤ Prσε,πεs (Φ) ≤ val +

ε

4. (3.2)

The first inequality follows since H1,βn ∩ H

0,βn = ∅ and the second inequality follows since

Prσε,πεs (H1,β

n ∪ H0,βn ) ≥ 1 − ε

4 . Since σε and πε are ε-optimal strategies we have α − ε ≤

Prσε,πεs (Φ) ≤ α + ε. This along with (3.2) yield that

α − ε − ε

4≤ val ≤ α + ε. (3.3)

Observe that Prσε,πεs (Φ | H

1,βn ) ≥ 1 − β and Prσε,πε

s (Φ | H0,βn ) ≤ β. Let q = Prσε,πε

s (H1,βn ).

Since Prσε,πεs (Φ | H

1,βn ) ≥ 1−β; ignoring the term Prσε,πε

s (Φ | H0,βn ) ·Prσε,πε

s (H0,βn ) in val and

from the second inequality of (3.3) we obtain that (1−β)·q ≤ α+ε. Since ε < c·α, β < 1− 1c1

,

and γ = α · (1 + c) we have

q ≤ α + ε

1 − β<

α · (1 + c)

1 − (1 − 1c1

)= c1 · γ (3.4)

We construct a strategy πε as follows: the strategy πε follows the strategy πε for the first

n − 1-stages; if a history in H1,βn is generated it follows πε, and otherwise it ignores the

history and switches to an ε-optimal strategy. Formally, for a history 〈s1, s2, . . . , sk〉 we


have

πε(〈s1, s2, . . . , sk〉) =

πε(〈s1, s2, . . . , sk〉) if k < n;

or Prσε,πεs (Φ | 〈s1, s2, . . . , sn〉) ≥ 1 − β;

πε(〈sn, . . . , sk〉) k ≥ n,Prσε,πεs (Φ | 〈s1, s2, . . . , sn〉) < 1 − β,

where πε is an ε-optimal strategy

Since πε and πε coincides for n − 1-stages we have Prσε,bπεs (H1,β

n ) = Prσε,πεs (H1,β

n ) and

Prσε,bπεs (H0,β

n ) = Prσε,πεs (H0,β

n ). Moreover, since Φ is a tail objective that is independent

of the prefix of length n; η ≤ mins′∈S Val2(Φ)(s′) and πε is an ε-optimal strategy, we have

Prσε,bπεs (Φ | H

0,βn ) ≥ η − ε. Also observe that

Prσε,bπεs (Φ | H0,β

n ) ≥ (η − ε) = Prσε,πεs (Φ | H0,β

n ) + (η − ε − Prσε,πεs (Φ | H0,β

n ))

≥ Prσε,πεs (Φ | H0,β

n ) + (η − ε − β), (3.5)


since Prσε,πεs (Φ | H

0,βn ) ≤ β. Hence we have the following inequality

Prσε,bπεs (Φ) ≥ Prσε,bπε

s (Φ | H1,βn ) · Prσε,bπε

s (H1,βn ) + Prσε,bπε


s (H0,βn )

= Prσε,πεs (Φ | H

1,βn ) · Prσε,πε

s (H1,βn ) + Prσε,bπε


s (H0,βn )

≥ Prσε,πεs (Φ | H

1,βn ) · Prσε,πε

s (H1,βn ) + Prσε,πε

s (Φ | H0,βn ) · Prσε,πε

s (H0,βn )

+ (η − ε − β) ·(1 − q − ε

4

) (since Prσε,πε

s (H0,βn ) ≥ 1 − q − ε

4

)

= val + (η − ε − β) · (1 − q − ε

4)

≥ α − ε − ε

4+ (η − ε − β) · (1 − q − ε

4) (recall first inequality of (3.3))

> α − ε − ε

4+ (η − 2ε) · (1 − q − ε

4) (since β < ε by (3.1))

> α − ε − ε

4+

η

2· (1 − q − ε

4) (since 2ε < η

2 by (3.1))

> α − ε − ε

4+

η

2· (1 − c1 · γ) − η

2· ε

4(since q < c1 · γ by (3.4))

> α − ε − ε

4+ 4ε − ε

8(since 2ε < η

4 · (1 − c1 · γ) by (3.1),

and η ≤ 1)

> α + ε.

The first equality follows since for histories in H1,βn , the strategies πε and πε coincide; and the

second inequality uses (3.5). Hence we have Prσε,bπεs (Φ) > α + ε and Prσε,bπε

s (Φ) < 1−α− ε.

This is a contradiction to the fact that Val1(Φ)(s) = 1−α and σε is an ε-optimal strategy.

The desired result follows.

Notation. We use the following notation for the rest of the chapter:

W 11 = s | Val1(Φ)(s) = 1; W 1

2 = s | Val2(Φ)(s) = 1.

W >01 = s | Val1(Φ)(s) > 0; W >0

2 = s | Val2(Φ)(s) > 0.

By determinacy of concurrent games with tail objectives, we have W 11 = S \ W >0

2 and

W 12 = S \ W >0

1 . We have the following finer characterization of the sets.


Corollary 4 For all concurrent game structures G, with tail objectives Φ for player 1, the

following assertions hold:

1. (a) if W >01 6= ∅, then W 1

1 6= ∅; and (b) if W >02 6= ∅, then W 1

2 6= ∅.

2. (a) if W >01 = S, then W 1

1 = S; and (b) if W >02 = S, then W 1

2 = S.

Proof. The first result is a direct consequence of Theorem 5. The second result is derived

as follows: if W >01 = S, then by determinacy we have W 1

2 = ∅. If W 12 = ∅, it follows from

part 1 that W >02 = ∅, and hence W 1

1 = S. The result of part 2 shows that if a player has

positive optimum value at every state, then the optimum value is 1 at all states.

Extension to countable state space. We first present an example to show that Corol-

lary 4 (and hence also Theorem 5) does not extend directly to countable state space. Then

we present the appropriate extension of Theorem 5 to countable state space.

Example 3 Consider a Markov chain defined on the countable state space S as follows:

S = SN ∪ t, where SN = si | i = 0, 1, 2, . . .. The transition probabilities are specified

as follows: the state t is an absorbing state; and from state si the next state is si+1 with

probability (12)

12i , and the next state is t with the rest of the probability. Consider the tail

objective Φ = Buchi(t). For a state si we have Val2(Φ)(si) = (12)

P∞

j=i1

2j = (12 )

1

2i−1 < 1.

That is, we have Val1(Φ)(s) > 0 for all s ∈ S. Hence W >01 = S, but however, W 1

1 6= S.

We now present the appropriate extension of Theorem 5 to countable state spaces.

Theorem 6 For all concurrent game structures G with countable state space, for all tail ob-

jectives Φ, if there exists a state s ∈ S such that Val1(Φ)(s) > 0, then sups′∈S Val1(Φ)(s′) =

1.

Proof. The key difference to the proof of Theorem 5 is to fix the constants. Assume towards

contradiction that there exists a state s such that Val1(s) > 0, but sups′∈S Val1(Φ)(s′) < 1.


Let α = 1 − Val1(Φ)(s) = Val2(Φ)(s). Since 0 < Val1(Φ)(s) < 1, we have 0 < α < 1. Let

η = infs′∈S Val2(Φ)(s′); since sups′∈S Val1(Φ)(s′) < 1 and Val2(Φ)(s′) = 1 − Val1(Φ)(s′)

for all s′ ∈ S, we have 0 < η. Also observe that since Val2(Φ)(s) = α < 1, we have η < 1.

Once the constant η is fixed, we can essentially follow the proof of Theorem 5 to obtain the

desired result.

3.3 Zero-sum Tail Games to Nonzero-sum Reachability

Games

In this section we relate the values of zero-sum games with tail objectives with

the Nash equilibrium values of nonzero-sum games with reachability objectives. The re-

sult shows that the values of a zero-sum game with complex objectives can be related to

equilibrium values of a nonzero-sum game with simpler objectives. We also show that for

MDPs the value function for a tail objective Φ can be computed by computing the maximal

probability of reaching the set of states with value 1. As an immediate consequence of the

above analysis, we obtain a polynomial time reduction of the quantitative analysis of MDPs

with tail objectives, to the qualitative analysis. We first prove a limit-reachability prop-

erty of ε-optimal strategies: the property states that for tail objectives, if the players play

ε-optimal strategies, for small ε > 0, then the game reaches W 11 ∪W 1

2 with high probability.

Theorem 7 (Limit-reachability) For all concurrent game structures G, for all tail ob-

jectives Φ for player 1, for all ε′ > 0, there exists ε > 0, such that for all states s ∈ S, for

all ε-optimal strategies σε and πε, we have

Prσε,πεs (Reach(W 1

1 ∪ W 12 )) ≥ 1 − ε′.

Proof. By determinacy it follows that W 11 ∪W 1

2 = S\(W >01 ∩W >0

2 ). For a state s ∈ W 11 ∪W 1

2

the result holds trivially. Consider a state s ∈ W >01 ∩ W >0

2 and let α = Val2(Φ)(s).


Observe that 0 < α < 1. Let η1 = mins∈W >02

Val1(Φ)(s) and η2 = maxs∈W >02

Val2(Φ)(s),

and let η = minη1, 1 − η2, and note that 0 < η < 1. Given ε′ > 0, fix ε such that

0 < 2ε < minη2 , η·ε′

12 . Fix any ε-optimal strategies σε and πε for player 1 and player 2,

respectively. Fix β such that 0 < β < ε and β < 12 . Let H

1,βn = H

1,βn (σε, πε,Φ) and

H0,βn = H

0,βn (σε, πε,Φ). Consider n such that Prσε,πε

s (H1,βn ∪ H

0,βn ) = 1 − ε

4 (such n exists

by Proposition 5), and also as β < 12 , we have H

1,βn ∩ H

0,βn = ∅. Let us denote by

val = Prσε,πεs (Φ | H1,β


n ) + Prσε,πεs (Φ | H0,β


n ).

Similar to inequality (3.2) of Theorem 5 we obtain that

val ≤ Prσε,πεs (Φ) ≤ val +

ε

4

Since σε and πε are ε-optimal strategies, similar to inequality (3.3) of Theorem 5 we obtain

that α − ε − ε4 ≤ val ≤ α + ε.

For W ⊆ S, let Reachn(W ) = 〈s1, s2, s3 . . .〉 | ∃k ≤ n. sk ∈ W denote the

set of paths that reaches W in n-steps. We use the following notations: Reach(W 11 ) =

Ω \ Reachn(W 11 ), and Reach(W 1

2 ) = Ω \ Reachn(W 12 ). Consider a strategy σε defined as

follows: for histories in H1,βn ∩ Reach(W 1

2 ), σε ignores the history after stage n and follows

an ε-optimal strategy σε (i.e., σε is an ε-optimal strategy); and for all other histories it

follows σε. Let z1 = Prσε,πεs (H1,β

n ∩ Reach(W 12 )). Since η2 = maxs∈W >0

2Val2(Φ)(s), and

player 1 switches to an ε-optimal strategy for histories of length n in H1,βn ∩Reach(W 1

2 ) and

Φ is a tail objective, it follows that for all ω = 〈s1, s2, . . . , sn, sn+1, . . .〉 ∈ H1,βn ∩Reach(W 1

2 ),

we have Prbσε,πεs (Φ | 〈s1, s2 . . . , sn〉) ≤ η2 + ε; where as Prσε,πε

s (Φ | 〈s1, s2 . . . , sn〉) ≥ 1 − β.

Hence we have

val2 = Prbσε,πεs (Φ) ≤ Prσε,πε

s (Φ) − z1 · (1 − β − η2 − ε) ≤ val +ε

4− z1 · (1 − β − η2 − ε),


since with probability z1 the decrease is at least by 1− β − η2 − ε. Since πε is an ε-optimal

strategy we have val2 ≥ α − ε; and since val ≤ α + ε, we have the following inequality

z1 · (1 − η2 − β − ε) ≤ 2ε +ε

4< 3ε

⇒ z1 <3ε

η − β − ε(since η ≤ 1 − η2)

⇒ z1 <3ε

η − 2ε<

6ε

η<

ε′

4(since β < ε; ε <

η

4; ε <

η · ε′24

)

Consider a strategy πε defined as follows: for histories in H0,βn ∩Reach(W 1

1 ), πε ig-

nores the history after stage n and follows an ε-optimal strategy πε; and for all other histories

it follows πε. Let z2 = Prσε,πεs (H0,β

n ∩ Reach(W 11 )). Since η1 = mins∈W >0

2Val2(Φ)(s), and

player 2 switches to an ε-optimal strategy for histories of length n in H0,βn ∩Reach(W 1

1 ) and

Φ is a tail objective, it follows that for all ω = 〈s1, s2, . . . , sn, sn+1, . . .〉 ∈ H1,βn ∩Reach(W 1

1 ),

we have Prσε,bπεs (Φ | 〈s1, s2 . . . , sn〉) ≥ η1−ε; where as Prσε,πε

s (Φ | 〈s1, s2 . . . , sn〉) ≤ β. Hence

we have

val1 = Prσε,bπεs (Φ) ≥ Prσε,πε

s (Φ) + z2 · (η1 − ε − β) ≥ val + z2 · (η1 − ε − β),

since with probability z2 the increase is at least by η1 − ε − β. Since σε is an ε-optimal

strategy we have val1 ≤ α + ε; and since val ≥ α − ε + ε4 , we have the following inequality

z2 · (η1 − β − ε) ≤ 2ε +ε

4< 3ε

⇒ z2 <3ε

η − β − ε(since η ≤ η1)

⇒ z2 <ε′

4(similar to the inequality for z1 <

ε′

4)


Hence z1 + z2 ≤ ε′

2 ; and then we have

Prσε,πεs (Reach(W 1

1 ∪ W 12 )) ≥ Prσε,πε

s (Reachn(W 11 ∪ W 1

2 ) ∩ (H1,βn ∪ H

0,βn ))

= Prσε,πεs (Reachn(W 1

1 ∪ W 12 ) ∩ H

1,βn )

+ Prσε,πεs (Reachn(W 1

1 ∪ W 12 ) ∩ H

0,βn )

≥ Prσε,πεs (Reachn(W 1

1 ) ∩ H1,βn ) + Prσε,πε

s (Reachn(W 12 ) ∩ H

0,βn )

≥ Prσε,πεs (H1,β

n ) + Prσε,πεs (H0,β

n ) − (z1 + z2)

≥ 1 − ε

4+

ε′

2≥ 1 − ε′ (since ε ≤ ε′).

The result follows.

Theorem 7 proves the limit-reachability property for tail objectives, under ε-optimal strate-

gies, for small ε. We present an example to show that Theorem 7 is not true for all objectives,

or for tail objectives with arbitrary strategies.

Example 4 Observe that in the game shown in Example 2, the objective was not a tail

objective and we had W 11 ∪ W 1

2 = ∅. Hence Theorem 7 need not necessarily hold for all

objectives. Also consider the game shown in Fig 3.3. In the game shown s1 and s2 are

absorbing state. At s0 the available moves for the players are as follows: Γ1(s0) = a and

Γ2(s0) = 1, 2. The transition function is as follows: if player 2 plays move 2, then the

next state is s1 and s2 with equal probability, and if player 2 plays move 1, then the next

state is s0. The objective of player 1 is Φ = Buchi(s0, s1), i.e., to visit s0 or s1 infinitely

often. We have W 11 = s1 and W 1

2 = s2. Given a strategy π that chooses move 1

always, the set W 11 ∪W 1

2 of states is reached with probability 0; however π is not an optimal

or ε-optimal strategy for player 2 (for ε < 12). This shows that Theorem 7 need not hold

if ε-optimal strategies are not considered. In the game shown, for an optimal strategy for

player 2 (e.g., a strategy to choose move 2) the play reaches W 11 ∪W 1

2 with probability 1.


s1s0 12

12

s2

a,1a,2a,2

Figure 3.3: A game with Buchi objective.

The following example further illustrates Theorem 7.

Example 5 (Concurrent Buchi game) Consider the concurrent game shown in Fig 3.4.

The avilable moves for the players at state s0 and s3 are 0, 1 and 0, 1, q, respectively.

At all other states the available moves for both the players are singleton. The transitions are

shown as labeled edges in the figure. The objective of player 1 is to visit s4 or s7 infinitely

often, i.e., Buchi(s4, s7). Observe that since at state s3 each player can choose move q,

it follows that the values for the players at state s3 (and hence at states s0, s1, s2, s4, s5 and

s6) is 12 . The value for player 1 is 1 at state s7 and 0 at state s8. Consider the strategy

σ for player 1 as follows: (a) at state s0 it plays 0 and 1, each with probability 12 , and

remembers the move played as the move b; (b) at state s3, player 1 remembers the move

c played by player 2 (since player 1 knows whether the state s1 or s2 was visited, it can

infer the move played by player 2 at s0); (c) at state s3 player 1 plays move b as long as

player 2 plays move c, otherwise player 1 plays the move q. Informally, player 1 plays both

its move uniformly at random at s0, and discloses to player 2, and remembers the move of

player 2. As long as player 2 follows her move, player 1 follows her move chosen in the

first round, else if player 2 deviates, then player 1 quits the game by playing q. A strategy π

for player 2 can be defined similarly. Given strategies σ and π the play ωσ,πs0 reaches s7 and

s8 with probability 0. Moreover, ωσ,πs0 satisfies Buchi(s4, s8) with probability 1

2 . However,

observe that the strategy σ is not an optimal strategy. Given the strategy σ, consider the

strategy π as follows: the strategy π chooses 0 and 1 with probability 12 at s0, and at s3 if

the chosen move c at s0 matches with the move b for player 1, then player 2 plays q (i.e.,


s0

s1

s2

s3

s4

s5

s6 s7s8

B

B

00,11

01,10

q0,q1

00,

11

01,10

1q,0q

Figure 3.4: A concurrent Buchi game.

quits the game) and otherwise it follows π. Given the strategy π, if player 1 follows σ, then

Buchi(s4, s7) is satisfied with only probability 14 . In the game shown, if both the players

follow any pair of optimal strategies, then the game reaches s7 and s8 with probability 1.

Lemma 4 is immediate from Theorem 7.

Lemma 4 For all concurrent game structures G, for all tail objectives Φ for player 1 and

Φ for player 2, for all states s ∈ S, we have

limε→0

supσ∈Σε(Φ),π∈Πε(Φ)

Prσ,πs (Reach(W 1

1 ∪ W 12 )) = 1;

limε→0


Prσ,πs (Reach(W 1

1 )) = Val1(Φ)(s);

limε→0


Prσ,πs (Reach(W 1

2 )) = Val2(Φ)(s).

Consider a non-zero sum reachability game GR such that the states in W 11 ∪W 1

2 are trans-

formed to absorbing states and the objectives of both players are reachability objectives:

the objective for player 1 is Reach(W 11 ) and the objective for player 2 is Reach(W 1

2 ). Note

that the game GR is not zero-sum in the following sense: there are infinite paths ω such


that ω 6∈ Reach(W 11 ) and ω 6∈ Reach(W 1

2 ) and each player gets a payoff 0 for the path ω.

We define ε-Nash equilibrium of the game GR and relate some special ε-Nash equilibrium

of GR with the values of G.

Definition 4 (ε-Nash equilibrium in GR) A strategy profile (σ∗, π∗) ∈ Σ × Π is an ε-

Nash equilibrium at state s if the following two conditions hold:

Prσ∗,π∗

s (Reach(W 11 )) ≥ sup

σ∈ΣPrσ,π∗

s (Reach(W 11 )) − ε

Prσ∗,π∗

s (Reach(W 12 )) ≥ sup

π∈ΠPrσ

∗,πs (Reach(W 1

2 )) − ε

Theorem 8 (Nash equilibrium of reachability game GR) The following assertion

holds for the game GR.

• For all ε > 0, there is an ε-Nash equilibrium (σ∗ε , π

∗ε) ∈ Σε(Φ) × Πε(Φ) such that for

all states s we have

limε→0

Prσ∗ε ,π∗

εs (Reach(W 1

1 )) = Val1(Φ)(s)

limε→0

Prσ∗ε ,π∗

εs (Reach(W 1

2 )) = Val2(Φ)(s).

Proof. It follows from Lemma 4.

Note that in case of MDPs the strategy for player 2 is trivial, i.e., player 2 has only one

strategy. Hence in context of MDPs we drop the strategy π of player 2. A specialization of

Theorem 8 in case of MDPs yields Theorem 9.

Theorem 9 For all MDPs GM , for all tail objectives Φ, we have

Val1(Φ)(s) = supσ∈Σ

Prσs (Reach(W 1

1 )) = Val1(Reach(W 11 ))(s)

Since the values in MDPs with reachability objectives can be computed in polynomial time

(by linear-programming) [Con92, FV97], our result presents a polynomial time reduction of

quantitative analysis of tail objectives in MDPs to qualitative analysis.


3.4 Construction of ε-optimal Strategies for Muller Objec-

tives

In this section we show that for Muller objectives witnesses of ε-optimal strategies

can be constructed as witnesses of certain limit-sure winning strategies that respect certain

local conditions. A key notion that will play an important role in the construction of ε-

optimal strategies is the notion of local optimality . Informally, a selector function ξ is a

memoryless strategy and the selector function ξ is locally optimal if it is optimal in the one-

step matrix game where each state is assigned a reward value Val1(Φ)(s). A locally optimal

strategy is a strategy that consists of locally optimal selectors. A locally ε-optimal strategy

is a strategy that has a total deviation from locally-optimal selectors of at most ε. We

note that local ε-optimality and ε-optimality are very different notions. Local ε-optimality

consists in the approximation of local optimal selectors; a locally ε-optimal strategy provides

no guarantee of yielding a probability of winning the game close to the optimal one.

Definition 5 (Selectors) A selector ξ for player i ∈ 1, 2 is a function ξ : S → Dist(A)

such that for all s ∈ S and a ∈ A, if ξ(s)(a) > 0, then a ∈ Γi(s). We denote by Λi the set

of all selectors for player i ∈ 1, 2. Observe that selectors coincide with the definition of

memoryless strategies.

Definition 6 (Locally ε-optimal selectors and strategies) A selector ξ is locally op-

timal for objective Φ if for all s ∈ S and a2 ∈ Γ2(s) we have

Eξ(s),a2s [Val1(Φ)(X1)] ≥ Val1(Φ)(s).

We denote by Λℓ(Φ) the set of locally-optimal selectors for objective Φ. A strategy σ is locally

optimal for objective Φ if for every history 〈s0, s1, . . . , sk〉 we have σ(〈s0, s1, . . . , sk〉) ∈

Λℓ(Φ), i.e., player 1 plays a locally optimal selector at every round of the play. We denote


by Σℓ(Φ) the set of locally optimal strategies for objective Φ. A strategy σε is locally ε-

optimal for objective Φ if for every strategy π ∈ Π, for all k ≥ 1, for all states s we have

Val1(Φ)(s) − Eσ,πs [Val1(Φ)(Xk)] ≤ ε.

Observe that a strategy that at each round i chooses a locally optimal selector with probability

at least (1 − εi), with∑∞

i=0 εi ≤ ε, is a locally ε-optimal strategy. We denote by Σℓε(Φ) the

set of locally ε-optimal strategies for objective Φ.

We first show that for all tail objectives, for all ε > 0, there exist strategies that are

ε-optimal and locally ε-optimal as well.

Lemma 5 For all tail objectives Φ, for all ε > 0,

1. Σ ε2(Φ) ⊆ Σℓ

ε(Φ),

2. Σε(Φ) ∩ Σℓε(Φ) 6= ∅.

Proof. For ε > 0, fix an ε2 -optimal strategy σ for player 1. By definition σ is an ε-optimal

strategy as well. We argue that σ ∈ Σℓε(Φ). Assume towards contradiction that σ 6∈ Σℓ

ε(Φ),

i.e., there exists a player 2 strategy π, a state s, and k such that

Val1(Φ)(s) − Eσ,πs [Val1(Φ)(Xk)] > ε.

Fix a strategy π∗ = (π + π) for player 2 as follows: play π for k steps, then switch to an

ε4 -optimal strategy π. Formally for a history 〈s1, s2, . . . , sn〉 we have

π∗(〈s1, s2, . . . , sn〉) =

π(〈s1, s2, . . . , sn〉) if n ≤ k

π(〈sk+1, sk+2, . . . , sn〉) if n > k,

where π is an ε4 -optimal strategy.


Since Φ is a tail objective we have Prσ,π∗

s (Φ) =∑

t∈S Prσ,eπt (Φ) · Prσ,π∗

s (Xk = t). Hence we

obtain the following inequality

Prσ,π∗

s (Φ) =∑

t∈S Prσ,eπt (Φ) · Prσ,π∗

s (Xk = t)

=∑

t∈S Prσ,eπt (Φ) · Prσ,π

s (Xk = t)

≤ ∑t∈S(Val1(Φ)(t) + ε

4 ) · Prσ,πs (Xk = t) (since π is an ε

4 -optimal strategy)

= Eσ,πs [Val1(Φ)(Xk)] + ε

4

Hence we have

Prσ,π∗

s (Φ) < (Val1(Φ)(s) − ε) +ε

4= Val1(Φ)(s) − 3ε

4< Val1(Φ)(s) − ε

2.

Since by assumption σ is an ε2 -optimal strategy we have a contradiction. This establishes

the desired result.

Definition 7 (Perennial ε-optimal strategies) A strategy σ is a perennial ε-optimal

strategy for objective Φ, if it is ε-optimal for all states s, and for all histories 〈s1, s2, . . . , sk〉,

for all strategies π ∈ Π for player 2, Prσ,πs (Φ | 〈s1, s2, . . . , sk〉) ≥ Val1(Φ)(sk) − ε, i.e., for

every history 〈s1, s2, . . . , sk〉, given the history the probability to satisfy Φ is within ε of the

value at sk. We denote by ΣPLε (Φ) the set of perennial ε-optimal strategies for player 1, for

objective Φ. The set of perennial ε-optimal strategies for player 2 is defined similarly and

we denote them by ΠPLε (Φ).

Existence of perennial ε-optimal strategies. The results of [dAM01] proves existence

of perennial ε-optimal strategies for concurrent games with parity objectives, for all ε > 0.

Since Muller objectives can be reduced to parity objectives, the following proposition follows.

Proposition 6 For all concurrent game structures, for all Muller objectives Φ, for all

ε > 0, ΣPLε (Φ) 6= ∅ and ΠPL

ε (Φ) 6= ∅.


Lemma 6 For all concurrent game structures G, for all Muller objectives Φ for player 1

and Φ for player 2, we have

infσ∈ΣPL

ε (Φ)supπ∈Π

Prσ,πs (Φ ∩ Safe(W >0

1 ∩ W >02 )) = 0;

infσ∈Σε(Φ)

supπ∈Π


1 ∩ W >02 )) = 0;

infπ∈ΠPL

ε (Φ)supσ∈Σ


1 ∩ W >02 )) = 0;

infπ∈Πε(Φ)

supσ∈Σ


1 ∩ W >02 )) = 0.

Proof. We show that

infσ∈ΣPL

ε (Φ)supπ∈Π


1 ∩ W >02 )) = 0.

Since for all ε > 0 we have ΣPLε (Φ) ⊆ Σε(Φ), this is sufficient to prove the first two claims.

The result for the last two claims is symmetric. We prove the first claim as follows. Let

W >0 = W >01 ∩ W >0

2 . Let η = mins∈W >0 Val1(Φ)(s), and observe that 0 < η < 1. Fix 0 <

2ε < η, and fix a perennial ε-optimal strategy σ ∈ ΣPLε (Φ). Consider a strategy π ∈ Π for

player 2. Since σ ∈ ΣPLε (Φ), for all k ≥ 1, for all histories 〈s1, s2, . . . , sk〉 such that si ∈ W >0

for all i ≤ k, we have Prσ,πs (Φ | 〈s1, s2, . . . , sk〉) ≥ η−ε > η

2 . For a history 〈s1, s2, . . . , sk〉 such

that there exists i ≤ k and si 6∈ W >0 we have Prσ,πs (Reach(W 1

1 ∪W 12 ) | 〈s1, s2, . . . , sk〉) = 1.

Hence it follows that for all n we have Prσ,πs (Φ ∪ Reach(W 1

1 ∪ W 12 ) | Fn) > η

2 . Since η2 > 0,

by Lemma 3, we have Prσ,πs (Φ∪Reach(W 1

1 ∪W 12 )) = 1, i.e., Prσ,π

s (Φ∩Safe(W >0)) = 0. The

desired result follows.

Theorem 10 Given a concurrent game structure G, with a tail objective Φ for player 1,

let σε ∈ Σℓε(Φ) be a locally ε-optimal strategy, and ε-optimal from W 1

1 (i.e., for all s ∈

W 11 and all π we have Prσε,π

s (Φ) ≥ 1 − ε). If for all strategies π for player 2 we have

Prσε,πs (Φ ∩ Safe(W >0

1 ∩ W >02 )) ≤ ε, then σε is an 3ε-optimal strategy.


Proof. Let η1 = maxs∈W >02

Val1(Φ)(s). Without loss of generality we assume that the

states in W 12 are converted to absorbing states and player 2 wins if the play reaches W 1

2 . Con-

sider an arbitrary strategy π for player 2, and consider a state s ∈ W >01 ∩W >0

2 , and let α =

Val1(Φ)(s). By local ε-optimality of σε, for all k ≥ 1, we have α − ε ≤ Eσε,πs [Val1(Φ)(Xk)].

Since for all s ∈ S we have Val1(Φ)(s) ≤ 1, we have Eσε,πs [Val1(Φ)(Xk)] ≤ Prσε,π

s (Xk ∈ W >01 ).

Hence we obtain the following inequality:

α − ε ≤ Eσε,πs [Val1(Φ)(Xk)] ≤ Prσε,π

s (Xk ∈ W >01 )

≤ Prσε,πs (Safe(W >0

1 )) = 1 − Prσε,πs (Reach(W 1

2 )).

Hence we have Prσε,πs (Reach(W 1

2 )) ≤ 1−α+ε. Thus we obtain that Prσε,πs (Φ∩Reach(W 1

2 )) ≤

1 − α + ε. Since σε is ε-optimal from W 11 , we have Prσε,π

s (Φ ∩ Reach(W 11 )) ≤ ε. The above

inequalities and along with the assumption of the lemma yield the following inequality:

Prσε,πs (Φ) = Prσε,π

s (Φ ∩ Safe(W >01 ∩ W >0

2 )) + Prσε,πs (Φ ∩ Reach(W 1

2 ))

+ Prσε,πs (Φ ∩ Reach(W 1

1 ))

≤ ε + 1 − α + ε + ε ≤ 1 − α + 3ε.

Thus Prσε,πs (Φ) ≥ α− 3ε. Since the above inequality holds for all π we obtain that σε is an

3ε-optimal strategy.

Lemma 6 shows that the ε-optimal strategies for player 1 are limit-sure winning against

objective Φ∩Safe(W >01 ∩W >0

2 ), for Muller objective Φ. Theorem 10 shows that if a strategy

is ε-limit sure winning for player 1 against objective Φ ∩ Safe(W >01 ∩ W >0

2 ) for player 2,

then local ε-optimality guarantees 3ε-optimality. This characterizes ε-optimal strategies as

local ε-optimal and ε-limit sure winning strategies.


3.5 Conclusion

In this chapter we studied concurrent games with tail objectives. We proved

the positive-limit one property and also related the values of zero-sum tail games with

Nash equilibria of nonzero-sum reachability games. We also presented construction of ε-

optimal strategies for Muller objectives. The computation of the sets W 11 ,W >0

1 and the

corresponding sets for player 2 for concurrent games and its sub-classes for tail objectives

remain open. The more general problem of computing the value functions also remain open.

We believe that algorithms for computing W 11 ,W >0

1 and the properties we prove in the

chapter could lead to algorithms for computing value functions. The exact characterization

of tail objectives in the Borel hierarchy also remains open.

60

Chapter 4

Stochastic Muller Games

In this chapter we study 212 -player games with Muller objectives.1 We present

an optimal memory bound for pure (deterministic) almost-sure and optimal strategies in

212 -player graph games with Muller conditions. In fact we generalize the elegant analysis

of [DJW97] to present an upper bound for optimal strategies for 212 -player graph games

with Muller conditions that matches the lower bound for sure winning in 2-player games.

We present the result for almost-sure strategies in subsection 4.3; and then generalize it

to optimal strategies in subsection 4.4. The results developed also help us to precisely

characterize the complexity of several classes of 212 -player Muller games. We show that the

complexity of quantitative analysis of 212 -player games with Muller objectives is PSPACE

complete. We also show that for two special classes of Muller objectives (namely, union-

closed and upward-closed objectives) the problem is coNP-complete. We also study the

memory bounds for randomized strategies. In case of randomized strategies we improve

the upper bound for almost-sure and optimal strategies as compared to pure strategies

(Section 4.5). The problem of a matching upper and lower bound for almost-sure and

optimal randomized strategies remains open. We start with some basic results on MDPs in

1Preliminary versions of the results of this chapter appeared in [CdAH04] and [Cha07b, Cha07c]

CHAPTER 4. STOCHASTIC MULLER GAMES 61

the following section.

4.1 Markov decision processes

We consider player-1 MDPs and hence only strategies for player 1. Let G =

((S,E), (S1, S2, SP ), δ) with S2 = ∅ be a 112 -player game graph. We present some basic

results on MDPs (112 -player game graphs) and some new results. In the sequel of this

section we consider player-1 MDPs and hence drop S2 from game graphs, since S2 = ∅.

4.1.1 MDPs with reachability objectives

We first consider MDPs with reachability objectives. The following theorem states

that the value function for MDPs with reachability objectives can be computed in polyno-

mial time by linear-programming. It also follows that pure memoryless optimal strategies

exist for MDPs with reachability objectives.

Theorem 11 ([FV97]) Given a player-1 MDP G = ((S,E), (S1, SP ), δ) and T ⊆ S, con-

sider the following linear program. For every state s ∈ S there is a variable xs; the objective

function and the constraints are as follows

min∑

s∈S

xs subject to

xs ≥ xt t ∈ E(s); s ∈ S1;

xs =∑

t∈E(s)

xt · δ(s)(t) s ∈ SP ;

xs = 1 s ∈ T.

For all s ∈ S we have xs = Val1(Reach(T ))(s).

Theorem 12 ([FV97]) The family ΣPM of pure memoryless strategies suffices for opti-

mality for reachability objectives on all 112 -player games (MDPs).


Almost-sure winning reachability property. Given an MDP G = ((S,E), (S1, SP ), δ)

and T ⊆ S, let U = Almost1(Reach(T )) be the set of almost-sure winning states. For all

states s ∈ (S \ U), the probability to reach T , for all strategies σ, is less than 1. For all

states s ∈ U ∩ SP , there cannot be an edge to (S \ U); otherwise from s the game reaches

(S \ U) with positive probability, and from (S \ U) the set T is reached with probability

less than 1. This will contradict that s ∈ U = Almost1(Reach(T )). Hence for all states

s ∈ U ∩SP we have E(s) ⊆ U . Moreover, for all states s ∈ U there is path from s to a state

in T . Hence we have the following characterization:

1. for all s ∈ (U \ T ) there exists a state t ∈ U ∩ E(s) such that the distance (BFS

distance) to T from t in the graph of G is smaller than the distance to T from s; we

refer to choosing such a successor as shortening the distance to T ; and

2. for all s ∈ (U \ T ) ∩ SP we have E(s) ⊆ U .

Moreover, a pure memoryless strategy σ that at every state s ∈ (U \ T ) ∩ S1 chooses a

successor to shorten distance to T ensures that T is reached with probability 1.

4.1.2 MDPs with Muller objectives

We now show that in MDPs with Muller objectives uniform randomized memo-

ryless (uniform over the support) optimal strategies exist. We develop some facts on end

components [CY90, dA97] that will be useful tools for analysis of MDPs.

Definition 8 (End component) A set U ⊆ S of states is an end component if U is

δ-closed and the subgame graph G U is strongly connected.

We denote by E ⊆ 2S the set of all end components of G. The next lemma states

that, under any strategy (memoryless or not), with probability 1 the set of states visited

infinitely often along a play is an end component. This lemma allows us to derive conclusions


on the (infinite) set of plays in an MDP by analyzing the (finite) set of end components in

the MDP.

Lemma 7 [CY90, dA97] For all states s ∈ S and strategies σ ∈ Σ, we have

Prσs (Muller(E)) = 1.

For an end component U ∈ E , we denote by σU the randomized memoryless

strategy that at each state s ∈ U ∩ S1 selects uniformly at random the states in E(s) ∩ U ;

we call this the uniform strategy for U . Under the uniform strategy for U , all states of U

are visited infinitely often; this follows immediately from the fact that U , under σU , is a

closed connected recurrent class of a Markov chain.

Lemma 8 For all end components U ∈ E and all states s ∈ U , we have

PrσUs (Muller(U)) = 1.

We will now prove that for MDPs with Muller objectives randomized memoryless

optimal strategies exist. We first state the result and then prove it.

Theorem 13 The family ΣM of randomized memoryless strategies suffices for optimality

with respect to Muller objectives on 112-player game graphs.

Given a set M ⊆ 2S of Muller sets, we denote by U = E ∩ M the set of end

components that are Muller sets. These are the winning end components. Let Tend =

⋃U∈U U be their union. From Lemma 7 and Theorem 9, it follows that the maximal

probability of satisfying the objective Muller(M) is equal to the maximal probability of

reaching the union of the winning end components.

Lemma 9 For all 112-player games and for all Muller objectives Muller(M) for M ⊆ 2S

we have Val1(Muller(M)) = Val1(Reach(Tend )).


To construct a memoryless optimal strategy, we let U = U1, . . . , Uk, thus fixing

an arbitrary order between the winning end components, and we define the rank of a state

s ∈ Tend by r(s) = max1 ≤ j ≤ k | s ∈ Uj. We define a randomized memoryless strategy

σ as follows:

• In S \ Tend , the strategy σ coincides with a pure memoryless optimal strategy to

reach Tend .

• At each state s ∈ Tend ∩ S1, the strategy σ coincides with the strategy σUr(s)(the

uniform strategy for Ur(s)); that is, it selects uniformly at random the states in E(s)∩

Ur(s) as successors.

Once such a memoryless strategy is fixed, the MDP becomes a Markov chain MC bσ, with

transition probabilities δbσ defined as follows:

δbσ(s)(t) =

σ(s)(t) s ∈ S1, t ∈ S;

δ(s)(t) s ∈ SP , t ∈ S.

The following lemma characterizes the closed connected recurrent classes of this Markov

chain in the set Tend , stating that they are all winning end components.

Lemma 10 If C is a closed connected recurrent class of the Markov chain MC bσ, then

either C ∩ Tend = ∅ or C ∈ U .

Proof. Let

Ebσ = (s, t) ∈ Tend × Tend | δbσ(s)(t) > 0.

The closed connected recurrent classes of MCbσ are the terminal strongly connected com-

ponents of the graph (Tend , Ebσ). The rank of the states along all paths in (Tend , Ebσ) is

nondecreasing. Hence each terminal strongly connected component C of (Tend , Ebσ) must

consist of states with the same rank, denoted r(C). Clearly, C ⊆ Ur(C). To see that


C = Ur(C) note that in C player 1 follows the strategy σUr(C), which causes the whole of

Ur(C) to be visited. Hence, as C is terminal, we have C = Ur(C).

The optimality of the randomized memoryless strategy σ is a simple consequence

of Lemma 7. Hence we have the following lemma which proves Theorem 13.

Lemma 11 For all states s ∈ S, we have Val1(Muller(M))(s) = Prbσs (Muller(M)).

4.1.3 MDPs with Rabin and Streett objectives

In this section we present polynomial time algorithm for computing values of MDPs

with Rabin and Streett objectives. It follows from the results of subsection 4.1.2 that it

suffices to compute the winning end components, and then solve a MDP with reachability

objective (which can then be achieved in polynomial time by Theorem 11).

Winning end components. Consider a set P = (E1, F1), . . . , (Ed, Fd) of Streett (resp.

Rabin) pairs. An end component U is winning for the Streett objective if ∀i ∈ [1..d].(U ∩

Ei 6= ∅ ∨ U ∩ Fi = ∅); and for the Rabin objective if ∃i ∈ [1..d].(U ∩ Ei = ∅ ∧ U ∩ Fi 6= ∅).

Winning end components for Rabin objectives. In [dA97] it was shown that the

winning end components for MDPs with Rabin objectives and Almost1(Rabin(P )) can be

computed by d-calls to a procedure of computing almost-sure winning states of a MDP with

Buchi objective. In [CJH03] we proved that the set of almost-sure winning states of MDPs

with Buchi objectives can be computed in O(m ·√m) time, where m is the number of edges.

Hence we have the following result.

Theorem 14 For all MDPs G = ((S,E), (S1, SP ), δ) with Rabin objective Rabin(P ), where

P = (E1, F1), (E2, F2), . . . , (Ed, Fd), the following assertions hold:

1. Almost1(Rabin(P )) can be computed in O(d · m · √m) time where m = |E|; and

2. Val1(Rabin(P )) can be computed in polynomial time.


Winning end components for Streett objectives. We now present a polynomial time

algorithm for computing the maximal probability of satisfying a Streett condition in an

MDP. We present a polynomial time algorithm for computing Tend ; the computation of the

value then reduces to computing values of a MDP with a reachability objective. To state

the algorithm, we say that an end component U ⊆ S is maximal in V ⊆ S if U ⊆ V , and

if there is no end component U ′ with U ⊂ U ′ ⊆ V . Given a set V ⊆ S, we denote by

MaxEC(V ) the set consisting in all maximal end components U such that U ⊆ V . This set

can be computed in quadratic time (in O(n ·m) time for graphs with n states and m edges)

with standard graph algorithms; see, e.g., [dA97]. The set Tend can be computed with the

following algorithm.

L := MaxEC(S); D := ∅while L 6= ∅ do

choose U ∈ L and let L := L \ Uif ∀i ∈ [1..d].(U ∩ Ei 6= ∅ ∨ U ∩ Fi = ∅)

then D := D ∪ Uelse choose i ∈ [1..d] such that U ∩ Fi 6= ∅, and let L := L ∪

MaxEC(U \ Fi)end if

end whileReturn: Tend =

⋃U∈D U .

It is easy to see that every state s ∈ S is considered as part of an end component in the

else-part of the above algorithm at most once for every 1 ≤ i ≤ d; hence, the algorithm

runs in time polynomial in n · m · d time.

Theorem 15 For all MDPs G = ((S,E), (S1, SP ), δ) with Streett objective Streett(P ),

where P = (E1, F1), (E2, F2), . . . , (Ed, Fd), the following assertions hold:

1. Almost1(Streett(P )) can be computed in O(d ·n ·m) time, where n = |S| and m = |E|;

and

2. Val1(Streett(P )) can be computed in polynomial time.


4.2 212-player Games with Muller objectives

We present a slightly different notation for Muller objectives. The slightly different

notation is consistent with the notation of [DJW97], and we do so to use the lower bound

results of [DJW97]. The results (memory bounds and complexity results) of this chapter

also hold for the definitions used in chapter 2.

Objectives. An objective for a player consists of an ω-regular set of winning plays

Φ ⊆ Ω [Tho97]. In this chapter we study zero-sum games [FV97, RF91], where the

objectives of the two players are complementary; that is, if the objective of one player

is Φ, then the objective of the other player is Φ = Ω \ Φ. We consider ω-regular objec-

tives specified as Muller objectives. For a play ω = 〈s0, s1, s2, . . .〉, let Inf(ω) be the set

s ∈ S | s = sk for infinitely many k ≥ 0 of states that appear infinitely often in ω. We

use colors to define objectives as in [DJW97]. A 212 -player game (G,C, χ,F ⊆ P(C)) con-

sists of a 212 -player game graph G, a finite set C of colors, a partial function χ : S C

that assigns colors to some states, and a winning condition specified by a subset F of the

power set P(C) of colors. The winning condition defines subset Φ ⊆ Ω of winning plays,

defined as follows:

Muller(F) = ω ∈ Ω | χ(Inf(ω)) ∈ F

that is the set of paths ω such that the colors appearing infinitely often in ω is in F .

Remarks. A winning condition F ⊆ P(C) has a split if there are sets C1, C2 ∈ F such

that C1∪C2 6∈ F . A winning condition is a Rabin winning condition if it do not have splits,

and it is a Streett winning condition if P(C)\F does not have a split. This notions coincide

with the Rabin and Streett winning conditions usually defined in the literature, i.e., as in

Chapter 2 (see [Niw97, DJW97] for details).

Determinacy. For sure winning, the 112 -player and 21

2 -player games coincide with 2-

player (deterministic) games where the random player (who chooses the successor at the


probabilistic states) is interpreted as an adversary, i.e., as player 2. Theorem 2 states

the classical determinacy result for 2-player games with Muller objectives. Theorem 16

(obtained as special case of Theorem 1) states the classical determinacy results for 212 -

player game graphs with Muller objectives. It follows from Theorem 16 that for all Muller

objectives Φ, for all ε > 0, there exists an ε-optimal strategy σε for player 1 such that for

all π and all s ∈ S we have Prσ,πs (Φ) ≥ Val1(Φ)(s) − ε.

Theorem 16 (Quantitative determinacy [Mar98]) For all 212-player game graphs, for

all Muller winning conditions F ⊆ P(C), and all states s, we have Val1(Muller(F))(s) +

Val2(Ω \ Muller(F))(s) = 1.

4.3 Optimal Memory Bound for Pure Qualitative Winning

Strategies

In this section we present optimal memory bounds for pure strategies with respect

to qualitative (almost-sure and positive) winning for 212 -player game graphs with Muller

winning conditions. The result is obtained by a generalization of the result of [DJW97]

and depends on the novel constructions of Zielonka [Zie98] for 2-player games. In [DJW97]

the authors use an insightful analysis of Zielonka’s construction to present an upper bound

(and also a matching lower bound) on memory of sure winning strategies in 2-player games

with Muller objectives. In this section we generalize the result of [DJW97] to show that the

same upper bound holds for qualitative winning strategies in 212 -player games with Muller

objectives. We now introduce some notations and the Zielonka tree of a Muller condition.

Notation. Let F ⊆ P(C) be a winning condition. For D ⊆ C we define (F D) ⊆ P(D)

as the set D′ ∈ F | D′ ⊆ D. For a Muller condition F ⊆ P(C) we denote by F the

complementary condition, i.e., F = P(C) \ F . Similarly for an objective Φ we denote by Φ

the complementary objective, i.e., Φ = Ω \ Φ.


Definition 9 (Zielonka tree of a winning condition [Zie98]) The Zielonka tree of a

winning condition F ⊆ P(C), denoted ZF ,C , is defined inductively as follows:

1. If C 6∈ F , then ZF ,C = ZF ,C , where F = P(C) \ F .

2. If C ∈ F , then the root of ZF ,C is labeled with C. Let C0, C1, . . . , Ck−1 be all the

maximal sets in X 6∈ F | X ⊆ C. Then we attach to the root, as its subtrees, the

Zielonka trees of F Ci, i.e., ZFCi,Ci, for i = 0, 1, . . . , k − 1.

Hence the Zielonka tree is a tree with nodes labeled by sets of colors. A node of ZF ,C is a

0-level node if it is labeled with a set from F , otherwise it is a 1-level node. In the sequel

we write ZF to denote ZF ,C if C is clear from the context.

Definition 10 (The number mF of Zielonka tree) Let F ⊆ P(C) be a winning condi-

tion and ZF0,C0,ZF1,C1, . . . ,ZFk−1,Ck−1be the subtrees attached to the root of the tree ZF ,C ,

where Fi = F Ci ⊆ P(Ci) for i = 0, 1, . . . , k − 1. We define the number mF inductively

as follows

mF =

1 if ZF ,C does not have any subtrees,

maxmF0,,mF1 , . . . ,mFk−1 if C 6∈ F , (1-level node)

∑k−1i=1 mFi

if C ∈ F , (0-level node).

We now state the results on strategy complexity of Muller objectives and its sub-

classes.

Theorem 17 (Finite memory and memoryless strategies) The following assertions

hold.

1. ([GH82]). The family of pure finite-memory strategies suffices for sure winning with

respect to Muller objectives on 2-player game graphs.


2. ([EJ88]). The family of pure memoryless strategies suffices for sure winning with

respect to Rabin objectives on 2-player game graphs.

Theorem 18 (Size of memory for sure-winning strategy[DJW97]) For all 2-

player game graphs, for all Muller winning conditions F , pure sure winning strategies

of size mF suffices for sure winning for objective Muller(F). For every Muller winning

conditions F , there exists a 2-player game graph, such that all pure sure winning strategies

for objective Muller(F) requires memory at least mF .

Our goal is to show that for winning conditions F pure finite-memory qualitative

winning strategies of size mF exist in 212 -player games. This proves the upper bound. The

results of [DJW97] (Theorem 18) already established the matching lower bound for 2-player

games. This establishes the optimal bound of memory of qualitative winning strategies for

212 -player games. We start with the key notion of attractors that will be crucial in our

proofs.

Definition 11 (Attractors) Given a 212 -player game graph G and a set U ⊆ S of states,

such that G U is a subgame, and T ⊆ S we define Attr1,P (T,U) as follows:

T0 = T ∩ U ; and for j ≥ 0 we define Tj+1 from Tj as

Tj+1 = Tj ∪ s ∈ (S1 ∪ SP ) ∩ U | E(s) ∩ Tj 6= ∅ ∪ s ∈ S2 ∩ U | E(s) ∩ U ⊆ Tj.

and A = Attr1,P (T,U) =⋃

j≥0 Tj. We obtain Attr2,P (T,U) by exchanging the roles of

player 1 and player 2. A pure memoryless attractor strategy σA : (A \ T ) ∩ S1 → S for

player 1 on A to T is as follows: for i > 0 and a state s ∈ (Ti \ Ti−1) ∩ S1, the strategy

σA(s) ∈ Ti−1 chooses a successor in Ti−1 (which exists by definition).

Lemma 12 (Attractor properties) Let G be a 212-player game graph and U ⊆ S be a

set of states such that G U is a subgame. For a set T ⊆ S of states, let Z = Attr1,P (T,U).

Then the following assertions hold.


1. G (U \ Z) is a subgame.

2. Let σZ be a pure memoryless attractor strategy for player 1. For all strategies π for

player 2 in the subgame G U and for all states s ∈ U we have

(a) if PrσZ ,π

s (Reach(Z)) > 0, then PrσZ ,πs (Reach(T )) > 0; and

(b) if PrσZ ,π

s (Buchi(Z)) > 0, then PrσZ ,π

s (Buchi(T ) | Buchi(Z)) = 1.

Proof. We prove the following cases.

1. Subgame property. For a state s ∈ U \Z, if s ∈ S1∪SP , then E(s)∩Z = ∅, (otherwise

s would have been in Z), i.e., E(s)∩U ⊆ U \Z. For a state s ∈ S2 ∩ (U \Z) we have

E(s) ∩ (U \Z) 6= ∅ (otherwise s would have been in Z). It follows that G (U \Z) is

a subgame.

2. We now prove the two cases.

(a) Positive probability reachability. Let

δmin = minδ(s)(t) | s ∈ SP , t ∈ S, δ(s)(t) > 0.

Observe that δmin > 0. Let Z =⋃

i≥0 Ti with T0 = T ; (as defined for attractors).

Consider a strategy σZ1,P of both player 1 and the random player on Z as follows:

player 1 follows an attractor strategy σZ on Z to T and for s ∈ (Ti \ Ti−1)∩ SP ,

the random player chooses a successor t ∈ Ti−1. Such a successor exists by

definition, and observe that such a choice is made in the game with probability

at least δmin. The strategy σZ1,P ensures that for all states s ∈ Z and for all

strategies π for player 2 in G U , the set T ∩ U is reached with in |Z|-steps.

Given player 1 follows an attractor strategy σZ , the probability of the choice of

σZ1,P is at least δ

|Z|min. It follows that a pure memoryless attractor strategy σZ


ensures that for all states s ∈ Z and for all strategies π for player 2 in G U we

have

PrσZ ,πs (Reach(T )) ≥ (δmin)|Z| > 0.


(b) Almost-sure Buchi property. Given a pure memoryless attractor strategy σZ , if

the set Z is visited ℓ-times, then by the previous part we have that T is reached

at least once with probability 1 − (1 − |δmin||Z|)ℓ, which goes to 1 as ℓ → ∞.

Hence for all states s and strategies π in G U , given PrσZ ,πs (Buchi(Z)) > 0, we

have PrσZ ,πs (Reach(T ) | Buchi(Z)) = 1. Since given the event that Z is visited

infinitely often (i.e., Buchi(Z)) the set T is reached with probability 1 from all

states, it follows that the set T is visited infinitely often with probability 1.

Formally, for all states s and strategies π in G U , given PrσZ ,πs (Buchi(Z)) > 0,

we have PrσZ ,π

s (Buchi(T ) | Buchi(Z)) = 1.

The result of the lemma follows.

Lemma 12 shows that the complement of an attractor is a subgame; and a pure

memoryless attractor strategy ensures that if the attractor of a set T is reached with positive

probability, then T is reached with positive probability, and given that the attractor of T

is visited infinitely often, then T is visited infinitely often with probability 1. We now

present the main result of this section (upper bound on memory for qualitative winning

strategies). A matching lower bound follows from the results of [DJW97] for 2-player games

(see Theorem 20).

Theorem 19 (Qualitative forgetful determinacy) Let (G,C, χ,F) be a 212 -player

game with Muller winning condition F for player 1. Let Φ = Muller(F), and consider


the following sets

W >01 = Positive1(Φ); W1 = Almost1(Φ);

W >02 = Positive2(Φ); W2 = Almost2(Φ).

The following assertions hold.

1. We have (a) W >01 ∪ W2 = S and W >0

1 ∩ W2 = ∅; and (b) W >02 ∪ W1 = S and

W >02 ∩ W1 = ∅.

2. (a) Player 1 has a pure strategy σ with memory of size mF such that for all states

s ∈ W >01 and for all strategies π for player 2 we have Prσ,π

s (Φ) > 0; and (b) player 2

has a pure strategy π with memory of size mF such that for all states s ∈ W2 and for

all strategies σ for player 1 we have Prσ,πs (Φ) = 1.

3. (a) Player 1 has a pure strategy σ with memory of size mF such that for all states

s ∈ W1 and for all strategies π for player 2 we have Prσ,πs (Φ) = 1; and (b) player 2

has a pure strategy π with memory of size mF such that for all states s ∈ W >02 and

for all strategies σ for player 1 we have Prσ,πs (Φ) > 0.

Proof. The first part of the result is a consequence of Theorem 16. We will concentrate

on the proof for the result for part 2. The last part (part 3) follows from a symmetric

argument.

The proof goes by induction on the structure of the Zielonka tree ZF ,C of the

winning condition F . We assume that C 6∈ F . The case when C ∈ F can be proved by

a similar argument: if C ∈ F , then we consider c 6∈ C and consider the winning condition

F = F ⊆ P(C ∪ c) with C ∪ c 6∈ F . Hence we consider, without loss of generality, that

C 6∈ F and let C0, C1, . . . , Ck−1 be the label of the subtrees attached to the root C, i.e.,

C0, C1, . . . , Ck−1 are maximal subset of colors that appear in F . We will define by induction

a non-decreasing sequence of sets (Uj)j≥0 as follows. Let U0 = ∅ and for j > 0 we define Uj

below:


χ−1(Dj)Uj−1

Xj

Zj

Yj Attr2,P (χ−1(Dj), Xj)Aj = Attr1,P (Uj−1, S)

Figure 4.1: The sets of the construction.

1. Aj = Attr1,P (Uj−1, S) and Xj = S \ Aj;

2. Dj = C \ Cj mod k and Yj = Xj \ Attr2,P (χ−1(Dj),Xj);

3. let Zj be the set of positive winning states for player 1 in (G Yj, Cj mod k, χ,F

Cj mod k), (i.e., Zj = Positive1(Muller(F Cj mod k)) in G Yj); hence (Yj \ Zj) is

almost-sure winning for player 2 in the subgame; and

4. Uj = Aj ∪ Zj.

Fig 4.1 describes all these sets. The property of attractors and almost-sure winning states

ensure certain edges are forbidden between the sets. This is shown is Fig 4.2. We start

with a few observations of the construction.

1. Observation 1. For all s ∈ S2 ∩ Zj , we have E(s) ⊆ Zj ∪ Aj . This follows from the

following case analysis.

• Since Yj is a complement of an attractor set Attr2,P (χ−1(Dj),Xj), it follows that

for all states s ∈ S2 ∩Yj we have E(s)∩Xj ⊆ Yj. It follows that E(s) ⊆ Yj ∪Aj.


χ−1(Dj)Uj−1

Xj

Zj

YjAj = Attr1,P (Uj−1, S) Attr2,P (χ−1(Dj), Xj)

Figure 4.2: The sets of the construction with forbidden edges.

• Since player 2 can win almost-surely from the set Yj \ Zj , if a state s ∈ Yj ∩ S2

has an edge to Yj \ Zj, then s ∈ Yj \ Zj. Hence for s ∈ S2 ∩ Zj we have

E(s) ∩ (Yj \ Zj) = ∅.

2. Observation 2. For all s ∈ Xj ∩ (S1 ∪ SP ) we have (a) E(s) ∩ Aj = ∅; else s would

have been in Aj ; and (b) if s ∈ Yj \ Zj , then E(s) ∩ Zj = ∅ (else s would have been

in Zj).

3. Observation 3. For all s ∈ Yj ∩ SP we have E(s) ⊆ Yj.

We will denote by Fi the winning condition F Ci, for i = 0, 1, . . . , k − 1, and

F i = P(Ci) \ Fi. By induction hypothesis on Fi = F Cj mod k, player 1 has a pure

positive winning strategy of size mFifrom Zj and player 2 has a pure almost-sure winning

strategy of size mF ifrom Yj \ Zj. Let W =

⋃j≥0 Uj. We will show in Lemma 13 that

player 1 has a pure positive winning strategy of size mF from W ; and then in Lemma 14

we will show that player 2 has a pure almost-sure winning strategy of size mF from S \W .


This completes the proof. We now prove the Lemmas 13 and 14.

Lemma 13 Player 1 has a pure positive winning strategy of size mF from the set W .

Proof. By induction hypothesis on Uj−1, player 1 has a pure positive winning strategy

σUj−1 of size mF from Uj−1. From the set Aj = Attr1,P (Uj−1, S), player 1 has a pure

memoryless attractor strategy σAj to bring the game to Uj−1 with positive probability

(Lemma 12(part 2.(a))), and then use σUj−1 and ensure winning with positive probability

from the set Aj. Let σZj be the pure positive winning strategy for player 1 in Zj of size mFi

,

where i = j mod k. We now show the combination of strategies σUj−1, σA

j and σZj ensure

positive probability winning for player 1 from Uj. If the play starts at a state s ∈ Zj ,

then player 1 follows σZj . If the play stays in Yj for ever, then the strategy σZ

j ensures

that player 1 wins with positive probability. By observation 1 of Theorem 19, for all states

s ∈ Yj ∩ S2, we have E(s) ⊆ Yj ∪ Aj . Hence if the play leaves Yj, then player 2 must chose

an edge to Aj . In Aj player 1 can use the attractor strategy σAj followed by σU

j−1 to ensure

positive probability win. Hence if the play is in Yj for ever with probability 1, then σZj

ensures positive probability win, and if the play reaches Aj with positive probability, then

σAj followed by σU

j−1 ensures positive probability win.

We now formally present σUj defined on Uj . Let σZ

j = (σZj,u, σZ

j,m) be the strategy

obtained from inductive hypothesis; defined on Zj (i.e., arbitrary elsewhere) of size mFi,

where i = j mod k, and ensure winning with positive probability on Zj. Let σZj,u be

the memory-update function and σZj,m be the next-move function of σZ

j . We assume the

memory MFiof σZ

j to be the set 1, 2, . . . ,mFi. The strategy σA

j : (Aj \ Uj−1) ∩ S1 → Aj

is a pure memoryless attractor strategy on Aj to Uj−1. The strategy σUj is as follows: the


memory-update function is

σUj,u(s,m) =

σUj−1,u(s,m) s ∈ Uj−1

σZj−1,u(s,m) s ∈ Zj ,m ∈ MFi

1 otherwise.

the next-move function is

σUj,m(s,m) =

σUj−1,m(s,m) s ∈ Uj−1 ∩ S1

σZj−1,m(s,m) s ∈ Zj ∩ S1,m ∈ MFi

σZj−1,m(s, 1) s ∈ Zj ∩ S1,m 6∈ MFi

σAj (s) s ∈ (Aj \ Uj−1) ∩ S1.

The strategy σUj formally defines the strategy we described and proves the result.

Lemma 14 Player 2 has a pure almost-sure winning strategy of size mF from the set S\W .

Proof. Let ℓ ∈ N be such that ℓ mod k = 0 and W = Uℓ−1 = Uℓ = Uℓ+1 = · · · = Uℓ+k−1.

From the equality W = Uℓ−1 = Uℓ we have Attr1,P (W,S) = W . Let us denote by W = S\W .

Hence G W is a subgame (by Lemma 12), and also for all s ∈ W∩(S1∪SP ) we have E(s) ⊆

W . The equality Uℓ+i−1 = Uℓ+i implies that Zℓ+i = ∅. Hence for all i = 0, 1, . . . , k − 1,

we have Zℓ+i = ∅. By inductive hypothesis for all i = 0, 1, . . . , k − 1, player 2 has a pure

almost-sure winning strategy πi of size mFiin the game (G Yℓ+i, Ci, χ,F Ci).

We now describe the construction of a pure almost-sure winning strategy π∗ for

player 2 in W . For Di = C \ Ci we denote by Di = χ−1(Di) the set of states with

colors Di. If the play starts in a state in Yℓ+i, for i = 0, 1, . . . , k − 1, then player 2 uses

the almost-sure winning strategy πi. If the play leaves Yℓ+i, then the play must reach

W \ Yℓ+i = Attr2,P (Di,W ), since player 1 and random states do not have edges to W .

In Attr2,P (Di,W ), player 2 plays a pure memoryless attractor strategy to reach the set


Di with positive probability. If the set Di is reached, then a state in Y(ℓ+i+1) mod k or in

Attr2,P

(D(i+1) mod k,W

)is reached. If Y(ℓ+i+1) mod k is reached π(i+1) mod k is followed,

and otherwise the pure memoryless attractor strategy to reach the set D(i+1) mod k with

positive probability is followed. Of course, the play may leave Y(ℓ+i+1) mod k, and reach

Y(ℓ+i+2) mod k, and then we would repeat the reasoning, and so on. Let us analyze various

cases to prove that π∗ is almost-sure winning for player 2.

1. If the play finally settles in some Yℓ+i, for i = 0, 1, . . . , k − 1, then from this moment

player 2 follows πi and ensures that the objective Φ is satisfied with probability 1.

Formally, for all states s ∈ W , for all strategies σ for player 1 we have Prσ,π∗

s (Φ |

coBuchi(Yℓ+i)) = 1. This holds for all i = 0, 1, . . . , k−1 and hence for all states s ∈ W ,

for all strategies σ for player 1 we have Prσ,π∗

s (Φ | ⋃0≤i≤k−1 coBuchi(Yℓ+i)) = 1.

2. Otherwise, for all i = 0, 1, . . . , k − 1, the set W \ Yℓ+i = Attr2,P (Di,W ) is visited

infinitely often. By Lemma 12, given Attr2,P (Di,W ) is visited infinitely often, then the

attractor strategy ensures that the set Di is visited infinitely often with probability 1.

Formally, for all states s ∈ W , for all strategies σ for player 1, for all i = 0, 1, . . . , k −

1, we have Prσ,π∗

s (Buchi(Di) | Buchi(W \ Yℓ+i)) = 1; and also Prσ,π∗

s (Buchi(Di) |⋂

0≤i≤k−1 Buchi(W \ Yℓ+i)) = 1. It follows that for all states s ∈ W , for all strategies

σ for player 1 we have Prσ,π∗

s (⋂

0≤i≤k−1 Buchi(Di) | ⋂0≤i≤k−1 Buchi(W \ Yℓ+i)) = 1.

Hence the play visits states with colors not in Ci with probability 1. Hence the set

of colors visited infinitely often is not contained in any Ci. Since C0, C1, . . . , Ck−1 are

all the maximal subsets of F , we have the set of colors visited infinitely often is not

in F with probability 1, and hence player 2 wins almost-surely.

Hence it follows that for all strategies σ and for all states s ∈ (S\W ) we have Prσ,π∗

s (Φ) = 1.

To complete the proof we present precise description of the strategy π∗ with memory of size

mF . Let πi = (πiu, πi

m) be an almost-sure winning strategy for player 2 for the subgame on


Yℓ+i with memory MFi. By definition we have mF =

∑k−1i=0 mFi

. Let MF =⋃k−1

i=0 (MF i×

i). This set is not exactly the set 1, 2, . . . ,mF, but has the same cardinality (which

suffices for our purpose). We define the strategy π∗ as follows:

π∗u(s, (m, i)) =

πiu(s, (m, i)) s ∈ Yℓ+i

(1, i + 1 mod k) otherwise.

π∗m(s, (m, i)) =

πim(s, (m, i)) s ∈ Yℓ+i

πLi(s) s ∈ Li \ Di

si s ∈ Di, si ∈ E(s) ∩ W.

where Li = Attr2,P (Di,W ); πLi is a pure memoryless attractor strategy on Li to Di, and

si is a successor state of s in W (such a state exists since W induces a subgame). This

formally represents π∗ and the size of π∗ satisfies the required bound. Observe that the

disjoint sum of all MF iwas required since Yℓ, Yℓ+1, . . . , Yℓ+k−1 may not be disjoint and the

strategy π∗ need to know which Yj the play is in.

It follows from the existence of pure finite-memory qualitative strategies, that for

Muller objectives, the set of almost-sure and limit-sure winning states coincide for 212 -player

game graphs. This along with Theorem 5 and Corollary 4 gives us the following corollary.

Corollary 5 For all 212 -player game graphs G, for all Muller objectives Ψ1 we have

Almost1(Ψ1) = Limit1(Ψ1). For all 212 -player game graphs G, for all Muller objectives

Ψ1, (a) if for all states s ∈ S we have Val1(Ψ1)(s) > 0, then Almost1(Ψ1) = S; and (b) if

Almost1(Ψ1) = ∅, then Almost2(Ω \ Ψ1) = S.

Lower bound. In [DJW97] the authors show a matching lower bound for sure winning

strategies in 2-player games (see Theorem 18). It may be noted that in 2-player games

any pure almost-sure winning or any pure positive winning strategy is also a sure winning

strategy. This observation along with the result of [DJW97] gives us the following result.


Theorem 20 (Lower bound [DJW97]) For all Muller winning conditions F ⊆ P(C),

there is a 2-player game (G,C, χ,F) (with a 2-player game graph G) such that every pure

almost-sure and positive winning strategy for player 1 requires memory of size at least mF ;

and every pure almost-sure and positive winning strategy for player 2 requires memory of

size at least mF .

4.3.1 Complexity of qualitative analysis

We now present algorithms to compute the almost-sure and positive winning states

for Muller objectives Muller(F) in 212 -player games. We will consider two cases: the case

when C ∈ F and when C 6∈ F . We present the algorithm for the later case (which recursively

calls the former case). Once the algorithm for the later case is obtained, we show how the

algorithm can be iteratively used to solve the former case.

Informal description of the algorithm. We present an algorithm to compute the

positive winning sets for player 1 and the almost-sure winning sets for player 2 for Muller

objectives Muller(F) for player 1 in 212 -player game graphs. We consider the case with

C 6∈ F and refer to this algorithm as MullerQualitativeWithoutC and the case when

C ∈ F we refer to the algorithm as MullerQualitativeWithC. The algorithm proceeds

iteratively removing positive winning sets for player 1: at iteration j the game graph is

denoted as Gj and the set of states as Sj . The algorithm is described as Algorithm 1.

Correctness. If W1 and W2 are outputs of Algorithm 1, then W1 = Positive1(Muller(F))

and W2 = Almost2(Muller(F)). The correctness follows from the correctness arguments

of Theorem 19. We now present an algorithm to compute the almost-sure winning

states Almost1(Muller(F)) for player 1 and positive winning states Positive2(Muller(F))

for player 2 for Muller objectives Muller(F) with C 6∈ F . Once we present this

algorithm, it is easy to exchange the roles of the players to obtain the algorithm

MullerQualitativeWithC. The algorithm to compute almost-sure winning states for


Algorithm 1 MullerQualitativeWithoutC

Input: A 212 -player game graph G, a Muller objective Muller(F) for player 1,

with F ⊆ P(C) and C 6∈ F .

Output: W1 and W2.

1. Let C0, C1, . . . , Ck−1 be the maximal sets that appear in F .

2. U0 = ∅; j = 0; G0 = G;

3. do

3.1 Dj = C \ Cj mod k;

3.2 Yj = Sj \ Attr2,P (χ−1(Dj), Sj);

3.3 (Aj1, A

j2) = MullerQualitativeWithC(Gj Yj,F Cj mod k);

3.4 if (Aj1 6= ∅)

3.4.1 Uj+1 = Uj ∪ Attr1,P (Uj ∪ Aj1, S

j);

3.4.2 Gj+1 = G (S \ Uj+1);

3.5 j = j + 1;

while (j ≤ k ∨ ¬(j mod k = 0 ∧ j > k ∧ ∀i. j − k ≤ i ≤ j. Ai1 = ∅));

4. return (W1,W2) = (Uj , S \ Uj).

player 1 for Muller objectives Muller(F) with C 6∈ F proceeds as follows: the algorithm

iteratively uses MullerQualitativeWithoutC and runs for atmost |S| iterations. At it-

eration i the algorithm computes the almost-sure winning set Aj2 for player 2 in the present

sub-game Gj , and the set of states such that player 2 can reach Aj2 with positive probability.

The above set is removed from the game graph, and the algorithm iterates on a smaller

game graph. The algorithm is formally described as Algorithm 2.

Correctness. Let W1 and W2 be the output of Algorithm 2, then we argue that


Algorithm 2 MullerQualitativeWithoutCIterative

Input: A 212 -player game graph G, a Muller objective Muller(F) for player 1,

with F ⊆ P(C) and C 6∈ F .

Output: W1 and W2.

1. Let C0, C1, . . . , Ck−1 be the maximal sets that appear in F .

2. X0 = ∅; j = 0; G0 = G;

3. do

3.1 (Aj1, A

j2) = MullerQualitativeWithoutC(Gj,F);

3.2 if (Aj2 6= ∅);

3.2.1 Xj+1 = Xj ∪ Attr2,P (Xj ∪ Aj2, S

0);

3.2.2 Gj+1 = G (S \ Xj+1);

3.5 j = j + 1;

while (Aj−12 6= ∅);

4. return (W1,W2) = (S \ Xj,Xj).

W1 = Almost1(Muller(F)) and W2 = Positive2(Muller(F)). It is clear that W2 ⊆

Positive2(Muller(F)). We now argue that W1 = Almost1(Muller(F)) to complete the

correctness arguments. When the algorithm terminates, let the game graph by Gj , and

we have Aj2 = ∅. Then in Gj , player 1 wins with positive probability from all states. It

follows from Theorem 5 and Corollary 5 that if a player wins in a game with positive prob-

ability from all states for a Muller objective, then the player wins with value 1 from all

states, and wins almost-surely from all states for 212 -player game graphs. It follows that

W1 = Almost1(Muller(F)). The correctness follows.

Time and space complexity. We now argue that the space requirement for the algorithms


are polynomial. Let us denote the space recurrence of Algorithm 1 as S(n, c) for game graphs

with n states and Muller objectives Muller(F) with c colors (i.e., F ⊆ P(C) with |C| = c).

Then the recurrence satisfies that S(n, c) = O(n) + S(n, c − 1) = O(n · c). The recurrence

requires space for recursive calls with at least one less color (denoted by S(n, c − 1)), and

O(n) space for the computation of the loop of the algorithm. This gives a PSPACE upper

bound, and a matching lower bound (of PSPACE-hardness) for the special case of 2-player

game graphs is given in [HD05]. The result also improves the previous 2EXPTIME bound

for the qualitative analysis for 212 -player games with Muller objectives (Corollary3).

Theorem 21 (Algorithm and complexity) The following assertions hold.

1. Given a 212-player game graph G and a Muller winning condition F Algorithm 1 and

Algorithm 2 computes an almost-sure winning strategy and the almost-sure winning

sets in O((|S| + |E|) · d)h+1) time and O(|S| · |C|) space; where d is the maximum

degree of a node and h is the height of the Zielonka tree ZF .

2. Given a 212 -player game graph G, a Muller objective Φ, and a state s, it is PSPACE-

complete to decide whether s ∈ Almost1(Φ).

4.4 Optimal Memory Bound for Pure Optimal Strategies

In this section we extend the sufficiency results for families of strategies from

almost-sure winning to optimality with respect to all Muller objectives. In the following,

we fix a 212 -player game graph G. We first present a useful proposition and then some

definitions. Since Muller objectives are infinitary objectives (independent of finite prefixes)

the following proposition is immediate.

Proposition 7 (Optimality conditions) For all Muller objectives Φ, for every s ∈ S

the following conditions hold.


1. If s ∈ S1, then for all t ∈ E(s) we have Val1(Φ)(s) ≥ Val1(Φ)(t), and for some

t ∈ E(s) we have Val1(Φ)(s) = Val1(Φ)(t).

2. If s ∈ S2, then for all t ∈ E(s) we have Val1(Φ)(s) ≤ Val1(Φ)(t), and for some

t ∈ E(s) we have Val1(Φ)(s) = Val1(Φ)(t).

3. If s ∈ SP , then Val1(Φ)(s) =( ∑

t∈E(s) Val1(Φ)(t) · δ(s)(t)).

Similar conditions hold for the value function Val2(Ω \ Φ) of player 2.

Definition 12 (Value classes) Given a Muller objective Φ, for every real r ∈ [0, 1] the

value class with value r is VC(Φ, r) = s ∈ S | Val1(Φ)(s) = r is the set of states with

value r for player 1. For r ∈ [0, 1] we denote by VC(Φ, > r) =⋃

q>r VC(Φ, q) the value

classes greater than r and by VC(Φ, < r) =⋃

q<r VC(Φ, q) the value classes smaller than r.

Definition 13 (Boundary probabilistic states) Given a set U of states, a state s ∈

U∩SP is a boundary probabilistic state for U if E(s)∩(S\U) 6= ∅, i.e., the probabilistic state

has an edge out of the set U . We denote by Bnd(U) the set of boundary probabilistic states

for U . For a value class VC(Φ, r) we denote by Bnd(Φ, r) the set of boundary probabilistic

states of value class r.

Observation. It follows from Proposition 7 that for a state s ∈ Bnd(Φ, r) we have E(s)∩

VC(Φ, > r) 6= ∅ and E(s) ∩ VC(Φ, < r) 6= ∅, i.e., the boundary probabilistic states have

edges to higher and lower value classes. It follows that for all Muller objectives Φ we have

Bnd(Φ, 1) = ∅ and Bnd(Φ, 0) = ∅.

Reduction of a value class. Given a set U of states, such that U is δ-live, let Bnd(U) be

the set boundary probabilistic states for U . We denote by GBnd(U) the subgame G U where

every state in Bnd(U) is converted to an absorbing state (state with a self-loop). Since U


is δ-live, we have GBnd(U) is a subgame. Given a value class VC(Φ, r), let Bnd(Φ, r) be

the set of boundary probabilistic states in VC(Φ, r). We denote by GBnd(Φ,r) the subgame

where every boundary probabilistic state in Bnd(Φ, r) is converted to an absorbing state.

We denote by GΦ,r = GBnd(Φ,r) VC(Φ, r): this is a subgame since every value class is

δ-live by Proposition 7, and δ-closed as all states in Bnd(Φ, r) are converted to absorbing

states.

Lemma 15 (Almost-sure reduction) Let G be a 212-player game graph and F ⊆ P(C)

be a Muller winning condition. Let Φ = Muller(F). For 0 < r < 1, the following assertions

hold.

1. Player 1 wins almost-surely for objective Φ∪Reach(Bnd(Φ, r)) from all states in GΦ,r,

i.e., Almost1(Φ ∪ Reach(Bnd(Φ, r))) = VC(Φ, r) in the subgame GΦ,r.

2. Player 2 wins almost-surely for objective Φ∪Reach(Bnd(Φ, r)) from all states in GΦ,r,

i.e., Almost2(Φ ∪ Reach(Bnd(Φ, r))) = VC(Φ, r) in the subgame GΦ,r.

Proof. We prove the first part and the second part follows from symmetric arguments.

The result is obtained through an argument by contradiction. Let 0 < r < 1, and let

q = maxVal1(Φ)(t) | t ∈ E(s) \ VC(Φ, r), s ∈ VC(Φ, r) ∩ S1,

that is, q is the maximum value a successor state t of a player 1 state s ∈ VC(Φ, r) such

that the successor state t is not in VC(Φ, r). By Proposition 7 we must have q < r.

Hence if player 1 chooses to escape the value class VC(Φ, r), then player 1 gets to see

a state with value at most q < r. We consider the subgame GΦ,r. Let U = VC(Φ, r)

and Z = Bnd(Φ, r). Assume towards contradiction, there exists a state s ∈ U such that

s 6∈ Almost1(Φ ∪ Reach(Z)). Then we have s ∈ (U \ Z) and Val2(Φ ∩ Safe(U \ Z))(s) >

0. It follows from Theorem 5 (and also Corollary 5) that for all Muller objectives Ψ, if

Val2(Ψ)(s) > 0, then for some state s1 we have Val2(Ψ)(s1) = 1. Observe that in GΦ,r


we have all states in Z are absorbing states, and hence the objective Φ ∩ Safe(U \ Z) is

equivalent to the objective Φ ∩ coBuchi(U \Z), which is a Muller objective. It follows that

there exists a state s1 ∈ (U \ Z) such that Val2(Φ ∩ Safe(U \ Z)) = 1. Hence there exists

a strategy π for player 2 in GΦ,r such that for all strategies σ for player 1 in GΦ,r we have

Prbσ,bπs1

(Φ∩Safe(U \Z)) = 1. We will now construct a strategy π∗ for player 2 as a combination

of the strategy π and a strategy in the original game G. By Martin’s determinacy result

(Theorem 16), for all ε > 0, there exists an ε-optimal strategy πε for player 2 in G such

that for all s ∈ S and for all strategies σ for player 1 we have

Prσ,πεs (Φ) ≥ Val2(Φ)(s) − ε.

Let r − q = α > 0, and let ε = α2 and consider an ε-optimal strategy for player 2 in G.

The strategy π∗ in G is constructed as follows: for a history w that remains in U , player 2

follows π; and if the history reaches (S \U), then player 2 follows the strategy πε. Formally,

for a history w = 〈s1, s2, . . . , sk〉 we have

π∗(w) =

π(w) if for all 1 ≤ j ≤ k. sj ∈ U ;

πε(sj, sj+1, . . . , sk) where j = mini | si 6∈ U

We consider the case when the play starts at s1. The strategy π∗ ensures the following: if

the game stays in U , then the strategy π is followed, and given the play stays in U , the

strategy π ensures with probability 1 that Φ is satisfied and Bnd(Φ, r) is not reached. Hence

if the game escapes U (i.e., player 1 chooses to escape U), then it reaches a state with value

at most q for player 1. We consider an arbitrary strategy σ for player 1 and consider the

following cases.

1. If Prσ,π∗

s1(Safe(U)) = 1, then we have Prσ,π∗

s1(Φ ∩ Safe(U)) = Prσ,bπ

s1(Φ ∩ Safe(U)) = 1.

Hence we also have Prσ,bπs1

(Φ) = 1, i.e., we have Prσ,π∗

s1(Φ) = 0.

2. If Prσ,π∗

s1(Reach(S \ U)) = 1, then the play reaches a state with value for player 1 at

most q and the strategy πε ensures that Prσ,π∗

s1(Φ) ≤ q + ε.


3. If Prσ,π∗

s1(Safe(U)) > 0 and Prσ,π∗

s1(Reach(S \U)) > 0, then we condition on both these

events and have the following:

Prσ,π∗

s1(Φ) = Prσ,π∗

s1(Φ | Safe(U)) · Prσ,π∗

s1(Safe(U))

+ Prσ,π∗

s1(Φ | Reach(S \ U)) · Prσ,π∗

s1(Reach(S \ U))

≤ 0 + (q + ε) · Prσ,π∗

s1(Reach(S \ U))

≤ q + ε.

The above inequalities are obtained as follows: given the event Safe(U), the strategy

π∗ follows π and ensures that Φ is satisfied with probability 1 (i.e., Φ is satisfied with

probability 0); else the game reaches states where the value for player 1 is at most q,

and then the analysis is similar to the previous case.

Hence for all strategies σ we have

Prσ,π∗

s1(Φ) ≤ q + ε = q +

α

2= r − α

2.

Hence we must have Val1(Φ)(s1) ≤ r−α2 . Since α > 0 and s1 ∈ VC(Φ, r) (i.e., Val1(Φ)(s1) =

r), we have a contradiction. The desired result follows.

Lemma 16 (Almost-sure to optimality) Let G be a 212 -player game graph and F ⊆

P(C) be a Muller winning condition. Let Φ = Muller(F). Let σ be a strategy such that

• σ is an almost-sure winning strategy from the almost-sure winning states (Almost1(Φ)

in G); and

• σ is an almost-sure winning strategy for objective Φ ∪ Reach(Bnd(Φ, r)) in the game

GΦ,r, for all 0 < r < 1.

Then σ is an optimal strategy.


Proof. We prove the result for the case when σ is memoryless (randomized memoryless).

The case when σ is finite-memory with memory M, the arguments can be repeated on the

game G × M (the usual synchronous product of G and the memory M).

Consider the player-2 MDP Gσ with the objective Muller(F) for player 2. In MDPs

with Muller objectives randomized memoryless optimal strategies exist (Theorem 13). We

fix a randomized memoryless optimal strategy π for player 2 in Gσ . Let W1 = Almost1(Φ)

and W2 = Almost2(Φ). We consider the Markov chain Gσ,π and analyze the recurrent states

of the Markov chain.

Recurrent states in Gσ,π. Let U be a closed, connected recurrent set in Gσ,π (i.e., U is a

bottom strongly connected component in the graph of Gσ,π). Let q = maxr | VC(Φ, r) ∩

U 6= ∅, i.e., for all q′ > q we have VC(Φ, q′)∩U = ∅ or in other words VC(Φ, > q)∩U = ∅.

For a state s ∈ U ∩ VC(Φ, q) we have the following cases.

1. If s ∈ S1, then Supp(σ(s)) ⊆ VC(Φ, q). This is because in the game GΦ,q the edges of

player 1 consists of edges in the value class VC(Φ, q)

2. If s ∈ SP and s ∈ Bnd(Φ, q), then it means that U ∩ VC(Φ, q′) 6= ∅, for some q′ > q:

this is because E(s) ∩ VC(,Φ, > q) 6= ∅ for s ∈ Bnd(Φ, q) and U is closed. This is

not possible since by assumption on U we have U ∩ VC(Φ, > q) = ∅. Hence we have

s ∈ SP ∩ (U \ Bnd(Φ, q)), and E(s) ⊆ VC(Φ, q).

3. If s ∈ S2, then since U∩VC(Φ, > q) = ∅, it follows by Proposition 7 that Supp(π(s)) ⊆

VC(Φ, q).

Hence for all s ∈ U ∩ VC(Φ, q) we have all successors of U in Gσ,π are in VC(Φ, q), and

moreover U ∩ Bnd(Φ, q) = ∅, i.e., U is contained in a value class and does not intersect

with the boundary probabilistic states. By the property of strategy σ, if U ∩ (S \W2) 6= ∅,

then for all s ∈ U we have Prσ,πs (Φ) = 1: this is because for all r > 0, the strategy σ is

almost-sure winning for objective Φ∪Reach(Bnd(Φ, r)) in GΦ,r. Since σ is a fixed strategy


and π is optimal against σ, it follows that if Val1(Φ)(s) < 1, then Prσ,πs (Φ) < 1. Hence

it follows that U ∩ (S \ (W1 ∪ W2)) = ∅. Hence the recurrent states of Gσ,π are contained

in W1 ∪ W2, i.e., we have Prσ,πs (Reach(W1 ∪ W2)) = 1. Since σ is an almost-sure winning

strategy in W1, we have Prσ,πs (Φ) = Prσ,π

s (Reach(W2)). Hence the strategy π maximizes

the probability to reach W2 in the MDP Gσ.

Analyzing reachability in Gσ. Since in Gσ player 2 maximizes the probability to reachability

to W2, we analyze the player-2 MDP Gσ with objective Reach(W2) for player 2. For every

state s consider a real-valued variable xs = 1 − Val1(Φ)(s) = Val2(Φ)(s). The following

constraints are satisfied

xs =∑

t∈Supp(σ(s)) xt · σ(s)(t) s ∈ S1;

xs =∑

t∈E(s) xt · δ(s)(t) s ∈ SP ;

xs ≥ xt s ∈ S2;

xs = 1 s ∈ W2;

The first equality follows as for all r ∈ [0, 1] and for all s ∈ S ∩ VC(Φ, r) we have

Supp(σ(s)) ⊆ VC(Φ, r). The next equality and the first inequality follows from Propo-

sition 7. Since the values for MDPs with reachability objective is characterized as the least

value vector satisfying the above constraints [FV97], it follows that for all s ∈ S and for all

strategies π1 ∈ Π we have

Prσ,π1s (Reach(W2)) ≤ xs = Val2(Φ)(s).

Hence we have Prσ,πs (Φ) ≤ Val2(Φ)(s), i.e., Prσ,π

s (Φ) ≥ 1 − Val2(Φ)(s) = Val1(Φ)(s). Thus

we obtain that σ is an optimal strategy.

Muller reduction for GΦ,r. Given a Muller winning condition F and the objective

Φ = Muller(F), we consider the game GΦ,r with the objective Φ ∪ Reach(Bnd(Φ, r)) for

player 1. We present a simple reduction to a game with objective Φ. The reduction is

achieved as follows: without loss of generality we assume F 6= ∅, and let F ∈ F and


F = cF1 , cF

2 , . . . , cFf . We construct a game graph GΦ,r with objective Φ for player 1 as

follows: convert every state sj ∈ Bnd(Φ, r) to a cycle Uj = sj1, s

j2, . . . , s

jf with χ(sj

i ) = cFi ,

i.e., once sj is reached the cycle Uj is repeated with χ(Uj) ∈ F . An almost-sure winning

strategy in GΦ,r with objective Φ ∪ Reach(Bnd(Φ, r)), is an almost-sure winning strategy

in GΦ,r with objective Φ; and vice-versa. The present reduction along with Lemma 15 and

Lemma 16 gives us Lemma 17. Observe that Lemma 15 ensures that strategies satisfying

conditions of Lemma 16 exist. Lemma 17 along with Theorem 19 gives Theorem 22, which

generalizes Theorem 18.

Lemma 17 For all Muller winning conditions F , the following assertions hold.

1. If the family of pure finite-memory strategies of size ℓPF suffices for almost-sure win-

ning on 212-player game graphs for Muller(F), then the family of pure finite-memory

strategies of size ℓPF suffices for optimality on 21

2 -player game graphs for Muller(F).

2. If the family of randomized finite-memory strategies of size ℓRF suffices for almost-

sure winning on 212 -player game graphs for Muller(F), then the family of randomized

finite-memory strategies of size ℓRF suffices for optimality on 21

2 -player game graphs

for Muller(F).

Theorem 22 For all Muller winning conditions F , the family of pure finite-memory strate-

gies of size mF suffices for optimality on 212-player game graphs for Muller objectives

Muller(F).

4.4.1 Complexity of quantitative analysis

In this section we consider the complexity of quantitative analysis of 212 -player

games with Muller objectives. We first prove some properties of the values of 212 -player

games with Muller objectives. We start with a lemma.


Lemma 18 For all 212-player game graphs, for all Muller objectives Φ, there exist optimal

strategies σ and π for player 1 and player 2 such that the following assertions hold:

1. for all r ∈ (0, 1), for all s ∈ VC(Φ, r) we have Prσ,πs (Reach(Bnd(Φ, r))) = 1;

2. for all s ∈ S we have

Prσ,πs (Reach(W1 ∪ W2)) = 1;

Prσ,πs (Reach(W1)) = Val1(Φ)(s); Prσ,π

s (Reach(W2)) = Val2(Φ)(s);

where W1 = Almost1(Φ) and W2 = Almost2(Φ).

Proof. Consider an optimal strategy σ that satisfies the conditions of Lemma 16, and a

strategy π that satisfies analogous conditions for player 2. For all r ∈ (0, 1), the strategy σ

is almost-sure winning for the objective Φ∪Reach(Bnd(Φ, r)) and the strategy π is almost-

sure winning for the objective Φ ∪ Reach(Bnd(Φ, r)), in the game GΦ,r. Thus we obtain

that for all r ∈ (0, 1), for all s ∈ VC(Φ, r) we have

Prσ,πs (Φ ∪ Reach(Bnd(Φ, r))) = 1; and Prσ,π

s (Φ ∪ Reach(Bnd(Φ, r))) = 1.

It follows that for all r ∈ (0, 1), for all s ∈ VC(Φ, r) we have

Prσ,πs (Reach(Bnd(Φ, r))) = 1.

From the above condition it easily follows that for all s ∈ S we have Prσ,πs (Reach(W1 ∪

W2)) = 1. Since σ and π are optimal strategies, all the requirements of the second condition

are fulfilled. Hence, the strategies σ and π are witness strategies to prove the desired result.

Characterizing values for 212-player Muller games. We now relate the values of 21

2 -

player game graphs with Muller objectives with the values of a Markov chain, on the same

state space, with reachability objectives. Once the relationship is established we obtain


bound on preciseness of the values. We use Lemma 18 to present two transformations to

Markov chains.

Markov chain transformation. Given a 212 -player game graph G =

((S,E), (S1, S2, SP ), δ) with a Muller objective Φ, let W1 = Almost1(Φ) and

W2 = Almost2(Φ) be the set of almost-sure winning states for the players. Let σ

and π be optimal strategies for the players (obtained from Lemma 18) such that

1. for all r ∈ (0, 1), for all s ∈ VC(Φ, r) we have Prσ,πs (Reach(Bnd(Φ, r))) = 1;

2. for all s ∈ S we have

Prσ,πs (Reach(W1 ∪ W2)) = 1;

Prσ,πs (Reach(W1)) = Val1(Φ)(s); Prσ,π

s (Reach(W2)) = Val2(Φ)(s).

We first consider a Markov chain that mimics the stochastic process under σ and π. The

Markov chain G = (S, δ) = MC1(G,Φ) with the transition function δ is defined as follows:

1. for s ∈ W1 ∪ W2 we have δ(s)(s) = 1;

2. for r ∈ (0, 1) and s ∈ VC(Φ, r) \ Bnd(Φ, r) we have δ(s)(t) = Prσ,πs (Reach(t)), for

t ∈ Bnd(Φ, r) (since for all s ∈ VC(Φ, r) we have Prσ,πs (Reach(Bnd(Φ, r))) = 1, the

transition function δ at s is a probability distribution); and

3. for r ∈ (0, 1) and s ∈ Bnd(Φ, r) we have δ(s)(t) = δ(s)(t), for t ∈ S.

The Markov chain G mimics the stochastic process under σ and π and yields the following

lemma.

Lemma 19 For all 212 -player game graphs G and all Muller objectives Φ, consider the

Markov chain G = MC1(G,Φ). Then for all s ∈ S we have Val1(Φ)(s) = Prs(Reach(W1)),

that is, the value for Φ in G is equal to the probability to reach W1 in the Markov chain G.


Second transformation. We now transform the Markov chain G to another Markov

chain G. We start with the observation that for r ∈ (0, 1), for all states s, t ∈ Bnd(Φ, r)

in the Markov chain G we have Prs(Reach(W1)) = Prt(Reach(W1)) = r. Moreover, for

r ∈ (0, 1), every state s ∈ Bnd(Φ, r) has edges to higher and lower value classes. Hence for

a state s ∈ VC(Φ, r) \Bnd(Φ, r) if we chose a state tr ∈ Bnd(Φ, r) and make the transition

probability from s to tr to 1, the probability to reach W1 does not change. This motivates

the following transformation: given a 212 -player game graph G = ((S,E), (S1, S2, SP ), δ)

with a Muller objective Φ, let W1 = Almost1(Φ) and W2 = Almost2(Φ) be the set of

almost-sure winning states for the players. The Markov chain G = (S, δ) = MC2(G,Φ)

with the transition function δ is defined as follows:

1. for s ∈ W1 ∪ W2 we have δ(s)(s) = 1;

2. for r ∈ (0, 1) and s ∈ VC(Φ, r) \ Bnd(Φ, r), pick t ∈ Bnd(Φ, r) and δ(s)(t) = 1; and

3. for r ∈ (0, 1) and s ∈ Bnd(Φ, r) we have δ(s)(t) = δ(s)(t), for t ∈ S.

Observe that for δ>0 = δ(s)(t) | s ∈ SP , t ∈ S, δ(s)(t) > 0 and δ>0 = δ(s)(t) | s ∈

S, t ∈ S, δ(s)(t) > 0, we have δ>0 ⊆ δ>0 ∪ 1, i.e., the transition probabilities in G are

subset of transition probabilities in G. Let

δu = maxq | δ(s)(t) =p

qfor s ∈ SP and δ(s)(t) > 0;

δu = maxq | δ(s)(t) =p

qfor s ∈ SP and δ(s)(t) > 0.

Since δ>0 ⊆ δ>0 ∪ 1, it follows that δu ≤ δu. The following lemma is immediate from

Lemma 19 and the equivalence of the probabilities to reach W1 in G and G.

Lemma 20 For all 212 -player game graphs G and all Muller objectives Φ, consider the

Markov chain G = MC2(G,Φ). Then for all s ∈ S we have Val1(Φ)(s) = Prs(Reach(W1)),

that is, the value for Φ in G is equal to the probability to reach W1 in the Markov chain G.


Lemma 21 is a result from [Con93] (Lemma 2 of [Con93]).

Lemma 21 ([Con93]) Let G = ((S,E), (S1, S2, SP ), δ) be 212-player game graph with n

states such that every state has at most two successors and for all s ∈ SP and t ∈ E(s) we

have δ(s)(t) = 12 . Then for all R ⊆ S, for all s ∈ S we have

Val1(Reach(R))(s) =p

qwhere p, q are integers with p, q ≤ 4n−1.

The results of [ZP96] showed that a 212 -player game graph G =

((S,E), (S1, S2, SP ), δ) can be reduced to an equivalent 212 -player game graph G =

((S, E), (S1, S2, SP ), δ) such that every state s ∈ S has at most two successors and for

all s ∈ SP and t ∈ E(s) we have δ(s)(t) = 12 , and |S| = 2 · |E| · log δu. Lemma 22 follows

from this reduction and Lemma 21.

Lemma 22 ([ZP96]) Let G = ((S,E), (S1, S2, SP ), δ) be 212 -player game graph. Then for

all R ⊆ S, for all s ∈ S we have

Val1(Reach(R))(s) =p

qwhere p, q are integers with p, q ≤ 42·|E|·log δu = δ4·|E|

u .

Lemma 23 For all 212-player game graphs G = ((S,E), (S1, S2, SP ), δ) and all Muller ob-

jectives Φ, for all states s ∈ S \ (W1 ∪ W2) we have

Val1(Φ)(s) =p

qwhere p, q are integers with 0 < p < q ≤ δ4·|E|

u ,

where W1 and W2 are the almost-sure winning states for player 1 and player 2, respectively.

Proof. Lemma 20 shows the values of the game G can be related to the values of reaching a

set of states in a Markov chain G defined on the same state space, and also we have δu ≤ δu.

The result on the bound on then follows from Lemma 22 and the fact that Markov chains

are a subclass of 212 -player games.


Lemma 24 Let G = ((S,E), (S1, S2, SP ), δ) be a 212-player game with a Muller objective

Φ. Let P = (V0, V1, V2, . . . , Vk) be a partition of the state space S, and let r0 > r1 > r2 >

. . . > rk be k-rational values such that the following conditions hold:

1. V0 = Almost1(Φ) and Vk = Almost2(Φ);

2. r0 = 1 and rk = 0;

3. for all 1 ≤ i ≤ k − 1 we have Bnd(Vi) 6= ∅ and Vi is δ-live;

4. for all 1 ≤ i ≤ k − 1 and all s ∈ S2 ∩ Vi we have E(s) ⊆ ⋃j≤i Vj ;

5. for all 1 ≤ i ≤ k − 1 we have Vi = Almost1(Φ ∪ Reach(Bnd(Vi))) in GBnd(Vi);

6. let xs = ri, for s ∈ Vi, and for all s ∈ SP , let xs satisfy that xs =∑

t∈E(s) xt · δ(s)(t).

Then we have Val1(Φ)(s) ≥ xs for all s ∈ S.

Proof. Let σ be a finite-memory strategy with memory M such that (a) σ is almost-

sure winning from V0; and (b) for all 1 ≤ i ≤ k − 1 and s ∈ Vi and all strategies π

for player 2 in GBnd(Vi) we have Prσ,πs (Φ ∪ Reach(Bnd(Vi)) = 1; such a strategy exists

since condition 1 (V0 = Almost1(Φ)) and condition 5 are satisfied. Let π be a finite-

memory counter-optimal strategy for player 2 in Gσ , i.e., π is optimal for player 2 for

objective Φ in Gσ. We claim that for all 1 ≤ i ≤ k − 1 and for all s ∈ Vi we have

Prσ,πs

(Reach(Bnd(Vi) ∪

⋃j<i Vj)

)= 1. To prove the claim, assume towards contradiction

that for some 1 ≤ i ≤ k − 1 and s ∈ Vi we have Prσ,πs

(Reach(Bnd(Vi) ∪

⋃j<i Vj)

)< 1.

Then since condition 4 holds we would have Prσ,πs (Safe(Vi \Bnd(Vi)) > 0. If Prσ,π

s (Safe(Vi \

Bnd(Vi)) > 0, then there must be a closed connected recurrent set C in Gσ,π such that C

is contained in (Vi \ Bnd(Vi)) × M. Hence for states s ∈ C we would have Prσ,πes (Φ) = 1;

this holds since we have Prσ,πs (Φ ∪ Reach(Bnd(Vi))) = 1. This contradicts the facts that π

is counter-optimal and Vi ∩Almost1(Φ) = ∅. Thus we obtain that for all 1 ≤ i ≤ k − 1 and


all s ∈ Vi we have Prσ,πs (Reach(Bnd(Vi) ∪

⋃j<i Vj)) = 1. It follows that for all s ∈ S we

have Prσ,πs (Reach(V0 ∪ Vk)) = 1. By the ordering r0 > r1 > r2 > . . . > rk, condition 4, and

condition 6, it follows that for all s ∈ S we have Prσ,πs (Reach(Vk)) ≤ 1− xs; this follows by

the analysis of the MDP Gσ with the reachability objective Reach(Vk) for player 2. Hence

we have Prσ,πs (Reach(V0)) ≥ xs. Since σ is almost-sure winning from V0, we obtain that for

all s ∈ S we have Val1(Φ)(s) ≥ xs. The desired result follows.

A PSPACE algorithm for quantitative analysis. We now present a PSPACE algo-

rithm for quantitative analysis for 212 -player games with Muller objectives Muller(F). A

PSPACE lower bound is already known for the qualitative analysis of 2-player games with

Muller objectives [HD05]. To obtain an upper bound we present a NPSPACE algorithm.

The algorithm is based on Lemma 24. Given a 212 -player game G = ((S,E), (S1, S2, SP ), δ)

with a Muller objective Φ, a state s and a rational number r, the following assertion hold:

if Val1(Φ)(s) ≥ r, then there exists a partition P = (V0, V1, V2, . . . , Vk) of S and rational

values r0 > r1 > r2 > . . . > rk, such that ri = pi

qiwith pi, qi ≤ δ

4·|E|u , such that conditions of

Lemma 24 are satisfied, and s ∈ Vi with ri ≥ r. The witness P is the value class partition

and the rational values represent the values of the value classes. From the above observation

we obtain the algorithm for quantitative analysis as follows: given a 212 -player game graph

G = ((S,E), (S1, S2, SP ), δ) with a Muller objective Φ, a state s and a rational r, to verify

that Val1(Φ)(s) ≥ r, the algorithm guesses a partition P = (V0, V1, V2, . . . , Vk) of S and

rational values r0 > r1 > r2 > . . . > rk, such that ri = pi

qiwith pi, qi ≤ δ

4·|E|u , and then

verifies that all the conditions of Lemma 24 are satisfied, and s ∈ Vi with ri ≥ r. Observe

that since the guesses of the rational values can be made with O(|G| · |S| · |E|) bits, the

guess is polynomial in size of the game. The condition 1 and the condition 5 of Lemma 24

can be verified in PSPACE by the PSPACE qualitative algorithms (see Theorem 21), and

all the other conditions can be checked in polynomial time. Since NPSPACE=PSPACE

we obtain a PSPACE upper bound for quantitative analysis of 212 -player games with Muller


objectives. This improves the previous 3EXPTIME bound for the quantitative analysis of

212 -player games with Muller objectives (Corollary3).

Theorem 23 Given a 212-player game G, a Muller objective Φ, a state s, and a rational r

in binary, it is PSPACE-complete to decide if Val1(Φ)(s) ≥ r.

4.4.2 The complexity of union-closed and upward-closed objectives

We now consider two special classes of Muller objectives: namely, union-closed

and upward-closed objectives. We will show the quantitative analysis of both these classes

of objectives in 212 -player games under succinct representation is co-NP-complete. We first

present these conditions.

1. Union-closed and basis conditions. A Muller winning condition F ⊆ P(C) is union-

closed if for all I, J ∈ F we have I∪J ∈ F . A basis condition B ⊆ P(C), given as a set

B specifies the winning condition F = I ⊆ C | ∃B1, B2, . . . , Bk ∈ B.⋃

1≤i≤k Bi = I.

A Muller winning condition F can be specified as a basis condition only if F is union-

closed.

2. Upward-closed and superset conditions. A Muller winning condition F ⊆ P(C) is

upward-closed if for all I ∈ F and I ⊆ J ⊆ C we have J ∈ F . A superset condition

U ⊆ P(C), specifies the winning condition F = I ⊆ C | J ⊆ I for some J ∈ U.

A Muller winning condition F can be specified as a superset condition only if F is

upward-closed. Any upward-closed condition is also union-closed.

The results of [HD05] showed that the basis and superset conditions are more

succinct ways to represent union-closed and upward-closed conditions, respectively, than

the explicit representation. The following proposition was also shown in [HD05] (see [HD05]

for the formal description of the notion of succinctness and translability).


Proposition 8 ([HD05]) A superset condition is polynomially translatable to an equiva-

lent basis condition.

Strategy complexity for union-closed conditions. We observe that for an union-closed

objective F , the Zielonka tree construction ensures that mF = 1. Then from Theorem 22 we

obtain that for objectives Muller(F) pure memoryless optimal strategies exist in 212 -player

game graphs, for union-closed conditions F .

Proposition 9 For all union-closed winning conditions F we have mF = 1; and pure

memoryless optimal strategies exist for objective Muller(F) for all 212 -player game graphs.

Complexity of basis and superset conditions. The results of [HD05] established that

deciding the winner in 2-player games (that is qualitative analysis for 2-player game graphs)

with union-closed and upward-closed conditions specified as basis and superset conditions

is coNP-complete. The lower bound for the special case of 2-player games, yields a coNP

lower bound for the quantitative analysis of 212 -player games with union-closed and upward-

closed conditions specified as basis and superset conditions. We will prove a matching upper

bound. We prove the upper bound for basis conditions, and by Proposition 8 the result also

follows for superset conditions.

The upper bound for basis games. We present a coNP upper bound for the quan-

titative analysis for basis games. Given a 212 -player game graph and a Muller objective

Φ = Muller(F), where F is union-closed and specified as a basis condition defined by B,

let s be a state and r be a rational given in binary. The problem whether Val1(Φ)(s) ≥ r

can be decided in coNP. We present a polynomial witness and polynomial time verification

procedure when the answer to the problem is “NO”. Since F is union-closed, it follows

from Proposition 9 that pure memoryless optimal strategy π exists for player 2. The pure

memoryless optimal strategy is the polynomial witness to the problem, and once π is fixed


we obtain a 112 -player game graph Gπ. To present a polynomial time verification procedure

we present a polynomial time algorithm to compute values in an MDP (or 112 -player games)

with basis condition B.

Polynomial time algorithm for MDPs with basis condition. Given an 112 -player

game graph G, let E be the set of end components. Consider a basis condition B =

B1, B2, . . . , Bk ⊆ P(C), and let F be the union-closed condition generated from B. The

set of winning end-components are U = E ∩F ⊆ S | χ−1(F ) ∈ F, and let Tend =⋃

U∈U U .

It follows from the results of subsection 4.1 that the value function in G can be computed by

computing the maximal probability to reach Tend . Once the set Tend is computed, the value

function for reachability objective in 112 -player game graphs can be computed in polynomial

time by linear-programming (Theorem 11). To complete the proof we present a polynomial

time algorithm to compute Tend .

Computing winning end components. The algorithm is as follows. Let B be the basis

for the winning condition and G be the 112 -player game graph. Initialize B0 = B and repeat

the following:

1. let Xi =⋃

B∈Biχ−1(B);

2. partition the set Xi into maximal end components MaxEC(Xi);

3. remove an element B of Bi such that χ−1(B) is not wholly contained in a maximal

end component to obtain Bi+1;

until Bi = Bi−1. When Bi = Bi−1, let X = Xi, and every maximal end component of X

is an union of basis elements (all Y in X are members of basis elements, i.e., χ−1(Y ) ∈ B,

and an basis element not contained in any maximal end component of X is removed in

step 3). Moreover, any maximal end component of G which is an union of basis elements

is a subset of an maximal end component of X, since the algorithm preserves such sets.


Hence we have X = Tend . The algorithm requires |B| iterations and each iteration requires

the decomposition of an 112 -player game graph into the set of maximal end components,

which can be achieved in O(|S| · |E|) time (see [dA97]). Hence the algorithm works in

O(|B| · |S| · |E|) time. This completes the proof and yields the following result.

Theorem 24 Given a 212 -player game graph and a Muller objective Φ = Muller(F), where

F is an union-closed condition specified as a basis condition defined by B or F is an upward-

closed condition specified as a superset condition U , a state s and a rational r given in binary,

it is coNP-complete to decide whether Val1(Φ)(s) ≥ r.

4.5 An Improved Bound for Randomized Strategies

We now show that if a player plays randomized strategies, then the upper bound

on memory for optimal strategies can be improved. We first present the notions of an

upward closed restriction of a Zielonka tree. The number mUF of such restrictions of the

Zielonka tree will be in general lower than the number mF of Zielonka trees, and we show

that randomized strategies with memory of size mUF suffices for optimality.

Upward closed restriction of Zielonka tree. The upward closed restriction of a Zielonka

tree for a Muller winning condition F ⊆ P(C), denoted as ZUF ,C , is obtained by making

upward closed conditions as leaves. Formally, we define ZUF ,C inductively as follows:

1. if F is upward closed, then ZUF ,C is leaf labeled F (i.e., it has no subtrees);

2. otherwise

(a) if C 6∈ F , then ZUF ,C = ZU

F ,C, where F = P(C) \ F .

(b) if C ∈ F , then the root of ZUF ,C is labeled with C; and let C0, C1, . . . , Ck−1

be all the maximal sets in X 6∈ F | X ⊆ C; then we attach to the root, as


its subtrees, the Zielonka upward closed restricted trees ZUF ,C of F Ci, i.e.,

ZUFCi,Ci

, for i = 0, 1, . . . , k − 1.

The number mUF for ZU

F ,C is the number defined as the number mF was defined for the tree

ZF ,C .

We will prove randomized strategies of size mUF suffices for optimality. To prove this

result, we first prove that randomized strategies of size mUF suffices for almost-sure winning.

The result then follows from Lemma 17. To prove the result for almost-sure winning we

take a closer look at the proof of Theorem 19. The inductive proof characterizes that

if existence of randomized memoryless strategies can be proved for 212 -player games with

Muller winning conditions that appear in the leaves of the Zielonka tree, then the inductive

proof generalizes to give a bound as in Theorem 19. Hence to prove an upper bound of

size mUF for almost-sure winning, it suffices to show that randomized memoryless strategies

suffices for upward closed Muller winning conditions. Lemma 25 proves this result and this

gives us Theorem 25.

Lemma 25 The family ΣUM of randomized uniform memoryless strategies suffices for

almost-sure winning with respect to upward closed objectives on 212 -player game graphs.

Proof. Consider a 212 -player game graph G and the game (G,C, χ,F) with an upward

closed objective Φ = Muller(F) for player 1, i.e., F is upward closed. Let W1 = Almost1(Φ)

be the set of almost-sure winning states for player 1 in G. We have S \ W1 = Positive2(Φ)

and hence any almost-sure winning strategy for player 1 ensures that from W1 the set S\W1

is not reached with positive probability. Hence we only require to consider strategies σ for

player 1 such that for all w ∈ W ∗1 and s ∈ W1 we have Supp(σ(w · s)) ⊆ W1. Consider

a randomized memoryless strategy σ for player 1 such that for a state s ∈ W1 it chooses

uniformly at random all successors in W1. Observe that for a state s ∈ (S2∪SP )∩W1 we have

E(s) ⊆ W1; otherwise s would not have been in W1. Consider the MDP Gσ W1. Since it is


a player-2 MDP with the Muller objective Φ and randomized memoryless optimal strategies

exist in MDPs (Theorem 13), we fix a memoryless counter-optimal strategy π for player 2

in Gσ W1. Now consider the player-1 MDP Gπ W1. Consider a memoryless strategy

σ′ in Gπ W1. We first present an observation: since the strategy σ chooses all successors

in W1 uniformly at random and for all s ∈ W1 ∩ S1 we have Supp(σ′(s)) ⊆ Supp(σ(s)),

it follows that for every closed recurrent set U ′ in the Markov chain Gσ′,π W1 there is

a closed recurrent set U in the Markov chain Gσ,π W1 with U ′ ⊆ U . We now prove

that σ is an almost-sure winning strategy by showing that all recurrent set of states U in

Gσ,π W1 is winning for player 1, i.e., χ(U) ∈ F . Assume towards contradiction, there

is a closed recurrent set U in Gσ,π W1 with χ(U) 6∈ F . Consider the player-1 MDP

Gπ W1. Since randomized memoryless optimal strategies exist in MDPs (Theorem 13),

we fix a memoryless counter-optimal strategy σ′ for player 1. By observation for any closed

recurrent set U ′ in Gσ′,π such that U ′∩U 6= ∅ we have U ′ ⊆ U ; and moreover, χ(U ′) ⊆ χ(U)

and χ(U ′) 6∈ F , since F is upward closed and χ(U) 6∈ F . It then follows that player 2 wins

with probability 1 in from a non-empty set U ′ (a closed recurrent set U ′ ⊆ U) of states

in the Markov chain Gσ′,π. Since π is a fixed strategy for player 2 and the strategy σ′ is

counter-optimal for player 1, this contradicts that U ′ ⊆ U ⊆ Almost1(Φ). It follows that

every closed recurrent set U in Gσ,π W1 is winning for player 1 and the result follows.

Theorem 25 For all Muller winning conditions F , the family of randomized finite-memory

strategies of size mUF suffices for optimality on 21

2 -player game graphs for Muller(F).

Remark. In general we have mUF < mF . Consider for example F ⊆ P(C), where C =

c1, c2, . . . , ck. For the Muller winning condition F = C. We have mUF = 1, and

mF = |C|.


4.6 Conclusion

In this chapter we presented optimal memory bounds for pure almost-sure, positive

and optimal strategies for 212 -player games with Muller winning conditions. We also present

improved memory bounds for randomized strategies. Unlike the results of [DJW97] our

results do not extend to infinite state games: for example, the results of [EY05] showed

that even for 212 -player pushdown games optimal strategies need not exist, and for ε > 0

even ε-optimal strategies may require infinite memory. For lower bound of randomized

strategies the constructions of [DJW97] do not work: in fact for the family of games used

for lower bounds in [DJW97] randomized memoryless almost-sure winning strategies exist.

However, it is known that there exist Muller winning conditions F ⊆ P(C), such that

randomized almost-sure winning strategies may require memory |C|! [Maj03]. However,

whether a matching lower bound of size mUF can be proved in general, or whether the upper

bound of mUF can be improved and a matching lower bound can be proved for randomized

strategies with memory remains open.

104

Chapter 5

Stochastic Rabin and Streett

Games

In this chapter we will consider 212 -player games with two canonical forms of Muller

objectives, namely, the Rabin and Streett objectives.1 We will prove that the quantitative

analysis of 212 -player games with Rabin and Streett objectives are NP-complete and coNP-

complete, respectively. We also present algorithms for both qualitative and quantitative

analysis of 212 -player games with Rabin and Streett objectives. We start with the qualitative

analysis of 212 -player games with Rabin objectives.

5.1 Qualitative Analysis of Rabin Games

In this section we present algorithms for qualitative analysis for 212 -player Rabin

games. We present a reduction of 212 -player Rabin games to 2-player Rabin games preserving

the ability of player 1 to win almost-surely. The reduction thus makes all algorithms for

2-player Rabin games [PP06, KV98] readily available for qualitative analysis of 212 -player

games with Rabin objectives.

1Preliminary versions of the results of this chapter appeared in [CJH04, CdAH05, CH06b]

CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES 105

Reduction. Given a 212 -player game graph G = ((S,E), (S1, S2, SP ), δ), a set P =

e1, f1, . . . , ed, fd of colors, and a color map [·]: S → 2P \ ∅, we construct a 2-player

game graph G = ((S,E), (S1, S2), δ) together with a color map [·]: S → 2P \ ∅ for the

extended color set P = P ∪ed+1, fd+1. The construction is specified as follows. For every

nonprobabilistic state s ∈ S1 ∪S2, there is a corresponding state s ∈ S such that (1) s ∈ S1

iff s ∈ S1, and (2) [s] = [s], and (3) (s, t) ∈ E iff (s, t) ∈ E. Every probabilistic state

s ∈ SP is replaced by the gadget shown in Figure 5.1. In the figure, diamond-shaped states

are player-2 states (in S2), and square-shaped states are player-1 states (in S1). From the

state s with [s] = [s], the players play the following 3-step game in G. First, in state s

player 2 chooses a successor (s, 2k), for k ∈ 0, 1, . . . , d. For every state (s, 2k), we have

[(s, 2k)] = [s]. For k ≥ 1, in state (s, 2k) player 1 chooses from two successors: state

(s, 2k − 1) with [(s, 2k − 1)] = ek, or state (s, 2k) with [(s, 2k)] = fk. The state (s, 0) has

only one successor (s, 0), with [(s, 0)] = f1, f2, . . . , fd, fd+1. Note that no state in S is

labeled by the new color ed+1, that is, [[ed+1]] = ∅. Finally, in each state (s, j) the choice

is between all states t such that (s, t) ∈ E, and it belongs to player 1 if k is odd, and to

player 2 if k is even.

We consider 212 -player games played on the graph G with P =

(e1, f1), . . . , (ed, fd) and the Rabin objective Rabin(P ) for player 1. We denote

by G = Tr1as(G) the 2-player game, with Rabin objective Rabin(P ), where P =

(e1, f1), . . . , (ed+1, fd+1), as defined by the reduction above. Also given a strategy (pure

memoryless) σ in the 2-player game G, a strategy σ = Tr1as(σ) in the 212 -player game G is

defined as follows:

σ(s) = t, if and only if σ(s) = t; for all s ∈ S1.

Definition 14 (Winning s.c.c.s and end components) Let G be a 1-player game

graph and Rabin(P ) the objective for player 1 with P = (e1, f1), (e2, f2), . . . , (ed, fd) the


E(s) E(s) E(s) E(s) E(s)

e1f1 e2 f2

E(s) E(s)

s

edfd

f1, f2, . . . , fd, fd+1

[s]

(es, 0)[s] [s]

(es, 2) (es, 4)[s] [s]

(es, 2d)

(bs, 0) (bs, 1) (bs, 2) (bs, 3) (bs, 4) (bs, 2d − 1) (bs, 2d)

Figure 5.1: Gadget for the reduction of 212 -player Rabin games to 2-player Rabin games.

set of d pairs of colors. A strongly connected component (s.c.c) C in G is winning for

player 1 if there exists i ∈ 1, 2, . . . , d such that C ∩ Fi 6= ∅ and C ∩ Ei = ∅; otherwise C

is winning for player 2. If G is a MDP with the set P of colors, then an end component

C in G is winning for player 1 if there exists i ∈ 1, 2, . . . , d such that C ∩ Fi 6= ∅ and

C ∩ Ei = ∅; otherwise C is winning for player 2.

Lemma 26 Given a 212 -player game graph G with Rabin objective Rabin(P ) for player 1,

let U1 and U2 be the sure winning sets for players 1 and 2, respectively, in the 2-player game

graph G = Tr1as(G) with the modified Rabin objective Rabin(P ). Define the sets U1 and U2 in

the original 212 -player game graph G by U1 = s ∈ S | s ∈ U1 and U2 = s ∈ S | s ∈ U2.

Then the following assertions hold:

1. (a) U1 ⊆ Almost1(Rabin(P )); and

2. (b) if σ is a pure memoryless sure winning strategy for player 1 from U1 in G, then

σ = Tr1as(σ) is an almost-sure winning strategy for player 1 from U1 in G.

Proof. Consider a pure memoryless sure winning strategy σ in the game G from every


state s ∈ U1; such a strategy exists by Theorem 17. Our goal is to establish that the pure

memoryless strategy σ = Tr1as(σ) is an almost-sure winning strategy from every state in U1.

Winning end components in (G U1)σ. We first prove that every end component in the

player-2 MDP (G U1)σ is winning for player 1. We argue that if there is an end component

in (G U1)σ that is winning for player 2, then we can construct an s.c.c in the subgraph

(G U1)σ that is winning for player 2. This will give a contradiction because σ is a sure

winning strategy for player 1 from the set U1 in the 2-player Rabin game G. Let C be

an end component in (G U1)σ that is winning for player 2. We denote by C the set

of states in the gadget of states in C. Since C is a winning end component for player 2

for all i ∈ 1, 2, . . . , d we have if Fi ∩ C 6= ∅, then C ∩ Ei 6= ∅. Let us define the set

I = i1, i2, . . . , ij such that Eik ∩ C 6= ∅. Thus for all i ∈ (1, 2, . . . , d \ I) we have

Fi ∩ C = ∅. Note that I 6= ∅, as every state has at least one color. We now construct a

subgame in Gσ as follows:

• For a state s ∈ C ∩ S2 keep all the edges (s, t) such that t ∈ C.

• For a state s ∈ C ∩ SP the subgame is defined as follows:

– At state s choose the edges to state (s, 2i) such that i ∈ I.

– For a state s ∈ U1, let dis(s,C ∩Ei) denote the shortest distance (BFS distance)

from s to C ∩ Ei in the graph of (G U1)σ. At state (s, 2i), which is a player 2

state, player 2 chooses a successor s1 such that dis(s1, C ∩ Ei) < dis(s,C ∩ Ei)

(i.e., shorten distance to the set C ∩Ei in G); unless s ∈ Ei, in which case (s, 2i)

keeps all the edges ((s, 2i), t) such that t ∈ C.

The construction is illustrated in Fig. 5.2. We now prove that every terminal s.c.c.

(i.e., a bottom strongly connected component) is winning for player 2 in the subgame thus

constructed in (G C)σ, where C is the set of states in the gadget of states in C. Consider


E(s) E(s) E(s)

s

[s]

ei1

(bs, 2i1)

fi1

E(s)

(es, 2i1)

E(s)

(bs, 2i1− 1)

E(s)

(es, 2i2)

ei2

(bs, 2i2− 1)

fi2

(bs, 2i2)

(es, 2ik)

eik

(bs, 2ik− 1)

fik

(bs, 2ik)

i1 i2

ik

[s] [s] [s]

Figure 5.2: The strategy sub-graph in Gσ .

any arbitrary terminal s.c.c Y in the subgame constructed in (G C)σ . It follows from the

construction that for every i ∈ 1, 2, . . . , d \ I we have Fi ∩ Y = ∅. Suppose there exists

an i ∈ I such that Fi ∩ Y 6= ∅, then we show that Ei ∩ Y 6= ∅. The claim follows from the

following case analysis.

1. If there is at least one state (s, 2i) in Y such that the strategy σ chooses the successor

(s, 2i − 1), then since [(s, 2i − 1)] = ei we have Ei ∩ Y 6= ∅.

2. Else for every state (s, 2i) in Y the strategy σ for player 1 chooses the successor (s, 2i).

For a state s ∈ Y , let dis(s,Ei) denote the shortest distance (BFS distance) from s

to Ei in the graph of (G U1)σ . At state (s, 2i), which is a player 2 state, player 2

chooses a successor s1 such that dis(s1, Ei) < dis(s,Ei) (i.e., shorten distance to the

set Ei); unless s ∈ Ei (in which case s ∈ Ei). Hence the terminal s.c.c. Y must contain

a state s such that [s] = ei. Hence Ei ∩ Y 6= ∅.

We now consider the probability of staying in U1. For every probabilistic state

s ∈ SP ∩U1, all of its successors are in U1, i.e., E(s) ⊆ U1. Otherwise, player 2 in the state

s of the game G can choose the successor (s, 0) and then a successor to its winning set U2.

This will again contradict the assumption that the strategy σ is a sure winning strategy for


the player 1 in the game G from the set U1. Similarly, for every state s ∈ S2 ∩ U1 we must

have E(s) ⊆ U1. For all states s ∈ S1∩U1 we have σ(s) ∈ U1. Hence for all strategies π, for

all states s ∈ U1, with probability 1 the set of states visited infinitely often along the play

ωσ,πs is an end component in U1: since for all s ∈ U1 and for all strategies π for player 2

we have Prσ,πs (Safe(U1)) = 1. It follows from Lemma 7 that, since every end component

in (G U1)σ is winning for player 1, the strategy σ is an almost-sure winning strategy for

player 1 and U1 ⊆ Almost1(Rabin(P )).

Notation for finite-memory strategy. Let π be a finite-memory strategy for player 2

in the game G with finite-memory M. The strategy π can be considered as a memoryless

strategy, denoted as π∗ = MemLess(π), in G × M (the synchronous product of G with M)

as follows: for s ∈ S and m ∈ M we have π∗((s, m)) = (s′, m′) where (a) the memory-update

function is πu(s, m) = m′ and (b) the next-move function is πm(s, m) = s′. A memoryless

strategy π∗ in G × M corresponding to the strategy π∗ is defined in a similar fashion as

σ = Tr1as(σ) was defined previously. From the strategy π∗ we can then easily define a

finite-memory strategy π in G with memory M; we refer to this strategy as π = Tr2pos(π).

Lemma 27 Given a 212 -player game graph G with Rabin objective Rabin(P ) for player 1,

let U1 and U2 be the sure winning sets for players 1 and 2, respectively, in the 2-player game

graph G = Tr1as(G) with the modified Rabin objective Rabin(P ). Define the sets U1 and U2 in

the original 212 -player game graph G by U1 = s ∈ S | s ∈ U1 and U2 = s ∈ S | s ∈ U2.

Then there exists a finite-memory strategy π for player 2 in the game G such that for

all strategies σ for player 1 for all states s ∈ U2 we have Prσ,πs (Streett(P )) > 0, i.e.,

Almost1(Rabin(P )) ⊆ (S \ U2).

Proof. The proof idea is similar to the proof of Lemma 26. Consider a finite-memory sure

winning strategy π for player 2 in the game G U2; such a strategy exists by Theorem 17.

Let M be the memory of the strategy π and π = Tr2pos(π) be the corresponding strategy in


G. We denote by π∗ = MemLess(π) and π∗ the corresponding memoryless strategies of π in

G×M and π in G×M, respectively. We first argue that every end component in the game

(G U2)π is winning for player 2.

Winning end components in (G U2)π. Consider the product game (G × M U2 × M)

and the corresponding memoryless strategy π∗ of π in the game G × M. We first argue

that every end component in (G × M U2 × M)π∗ is winning for player 2. Assume towards

contradiction that C is an end component in (G×M U2×M)π∗ that is winning for player 1.

Then we construct an s.c.c. that is winning for player 1 in (G × M U2 × M)π∗ . This will

give us a contradiction: since π is a sure winning strategy for player 2 in G U2. We

describe the key steps to construct a winning s.c.c. C for player 1 in (G × M U2 × M)π∗

from a winning end component C for player 1 in (G × M U2 × M)π∗ . Mainly we describe

the strategy corresponding to a probabilistic state. If C is a winning end component for

player 1, consider i to be the witness Rabin pair to exhibit that C is winning, i.e., C∩Fi 6= ∅

and C ∩ Ei = ∅. The strategy for player 1 is as follows.

• If the strategy π∗ for player 2 at a state (s, m) chooses successor ((s, 0), m′), then the

following successor state is ((s, 0), m′) and since [(s, 0)] = f1, f2, . . . , fd, fd+1 player 1

ensures that a state in Fi is visited.

• If the strategy π∗ for player 2 at a state (s, m) chooses a successor ((s, 2i), m′), then

player 1 chooses a successor ((s, 2i − 1), m′), where m, m′ ∈ M. Since [(s, 2i)] = fi

player 1 ensures that a state in Fi is visited.

• If the strategy π∗ for player 2 at a state (s, m) chooses a successor ((s, 2j), m′), for

j 6= i, then player 1 chooses a successor ((s, 2j − 1), m′) and then a successor to

shorten distance to the set Fi, where m, m′ ∈ M. Since [(s, 2j − 1)] = ej 6= ei, player 1

ensures that a state in Ei is not visited.

The construction is illustrated in Fig. 5.3. Consider any terminal s.c.c. Y in


[s] [s]

i

[s]

(es, 2i)

Visits Fi

j 6= i

[s]

(es, 2j)

s s

E(s)E(s)E(s)E(s)

fito Fi

distanceShorten

Figure 5.3: The strategy sub-graph in Gπ .

the subgame thus constructed. The strategy for player 1 ensures that in the subgame C

whenever a state s is visited such that s ∈ SP , no state in Ei is visited. Since C ∩ Ei = ∅,

it follows that Y ∩ Ei = ∅. Moreover, the strategy for player 1 ensures that a state in Fi

is always visited, i.e., Y ∩ Fi 6= ∅. Hence in the subgame of (G × M C × M)π∗ every

terminal s.c.c. Y is winning for player 1, i.e., Fi ∩Y 6= ∅ and Ei ∩Y = ∅. However, this is a

contradiction since π is a sure winning strategy for player 2. Hence, all the end components

in (G × M U2 × M)π∗ are winning for player 2.

Reachability to end components. We prove that for all states s ∈ U2 × M, for all strategies

σ for player 1, the play ωσ,π∗

s stays in U2 × M with positive probability. Since every end

component in U2 × M is a winning end component for player 2, the desired result is then

established. We prove the above claim by contradiction. Let U3 ⊆ U2×M be the set of states

such that there is a strategy σ for player 1, such that for all states s ∈ U3 the play ωσ,π∗

s

reaches U1 with probability 1. Assume towards contradiction that U3 is non-empty. Note

that (G×M U2 ×M)π∗ is a finite-state MDP. It follows from the almost-sure reachability

properties of MDPs (subsection 4.1.1) that for all states s ∈ U3 there is a successor in U3

that shortens the distance to U1; and for all probabilistic states s ∈ U3 all the successors of


s are in U3. Consider a strategy σ in (G × M U2 × M)π∗ as follows:

1. for all states s ∈ U3 ∩ S1 choose a successor in U3 that shortens the distance to U1;

2. for all states s ∈ U3 ∩ SP

• at state (s, 2i) chose successor (s, 2i − 1), for i ≥ 1;

• at state (s, 2i − 1) chose a successor that shortens the distance to U1.

Given the strategy σ and π∗ there are two cases.

• If there is no cycle in U3, then the play given the strategies σ and π∗ reaches U1 and

player 1 wins from every state in U1. Since π∗ is a sure winning strategy for player 2

from U2 × M ⊇ U3, we have a contradiction.

• If there is a cycle C in U3, then the strategy π∗ must chose the successor (s, 0) for

some state s ∈ C ∩ SP . The above fact follows because the strategy σ ensures that

for all state (s, 2i), for i ≥ 1, the distance to U1 decreases. Hence C ∩ Fd+1 6= ∅.

Since Ed+1 = ∅ it follows that the cycle C is winning for player 1. Since C ⊆ U3 ⊆

U2 × M and π∗ is a winning strategy for player 2 from all state in U2 × M we have a

contradiction.

Hence U3 = ∅ and this completes the proof.

It follows from Lemma 26 and Lemma 27 that U1 = Almost1(Rabin(P )). The reduction of a

212 -player G to the 2-player game G blowed up states and edges of states in SP by a factor of

d, and added one more pair. This result readily allows us to use the algorithms for 2-player

Rabin games for qualitative analysis of 212 -player Rabin games. Moreover, pure memoryless

almost-sure winning strategies exist for 212 -player Rabin games and finite-memory positive

winning strategies exist for 212 -player Street games; and these strategies can be extracted

from sure winning strategies in 2-player games.


Theorem 26 Given a 212 -player game graph G with Rabin objective Rabin(P ) for player 1,

with d-pairs, and let n = |S| and m = |E|. Let U1 and U2 be the sure winning sets for

players 1 and 2, respectively, in the 2-player game graph G = Tr1as(G) with the modified

Rabin objective Rabin(P ). Define the sets U1 and U2 in the original 212 -player game graph G

by U1 = s ∈ S | s ∈ U1 and U2 = s ∈ S | s ∈ U2. Then the following assertions hold.

1. We have

U1 = Almost1(Rabin(P )); and

U2 = S \ Almost1(Rabin(P )) = s | ∃π ∈ ΠPF . ∀σ ∈ Σ. Prσ,πs (Streett(P )) > 0.

2. The set Almost1(Rabin(P )) can be computed in time TwoPlRabinGame(n·d,m·d, d+1),

where TwoPlRabinGame(n·d,m·d, d+1)) the time complexity of a 2-player Rabin game

solving algorithm with n · d states, m · d, edges and d + 1 Rabin pairs.

3. If σ is a pure memoryless sure winning strategy for player 1 from U1 in G, then

σ = Tr1as(σ) is an almost-sure winning strategy for player 1 from U1 in G.

Theorem 27 The family ΣPM of pure memoryless strategies suffices for almost-sure win-

ning with respect to Rabin objectives on 212 -player game graphs.

Almost-sure winning for Streett objectives. Then almost-sure winning strategies for

Streett objectives can also be obtained by reduction to 2-player games. The gadget of

reduction needs to be slightly modified. We describe the modification below for the gadget

for a probabilistic state. We first consider the gadget for Rabin reduction without the edge

to the state (s, 0). The starting state is a player 1 state which can choose between the

above gadget, or a state (s, 0) to a successor to (s, 0) which is now a player 1 state and we

have [(s, 0)] = e1, e2, . . . , ed+1. The successor for (s, 0) is as defined for the reduction for

almost-sure winning in Rabin games. The reduction gadget is illustrated in Fig 5.4. We

refer to this reduction as G2 = Tr2as(G). This reduction preserves almost-sure winning for


E(s) E(s) E(s) E(s) E(s)

e1f1 e2 f2

E(s) E(s)

s

edfd

[s]

(es, 0)[s] [s]

(es, 2) (es, 4)[s] [s]

(es, 2d)

(bs, 0) (bs, 1) (bs, 2) (bs, 3) (bs, 4) (bs, 2d − 1) (bs, 2d)

[s]

e1, e2, . . . , ed, ed+1

Figure 5.4: Gadget for the reduction of 212 -player Streett games to 2-player Streett games.

player 2 with the Streett objective. Moreover, from a finite-memory sure winning strategy

π in G2 we can extract a finite-memory almost-sure winning strategy π in G. The mapping

of the strategy π to π is obtained in a similar fashion as for the previous reduction; we refer

to this mapping of strategy as π = Tr2as(π).

Theorem 28 Given a 212 -player game graph G with Rabin objective Rabin(P ) for player 1,

with d-pairs, and let n = |S| and m = |E|. Let U1 and U2 be the sure winning sets for

players 1 and 2, respectively, in the 2-player game graph G = Tr2as(G) with the modified

Rabin objective Rabin(P ) for player 1. Define the sets U1 and U2 in the original 212 -player

game graph G by U1 = s ∈ S | s ∈ U1 and U2 = s ∈ S | s ∈ U2. Then the following

assertions hold.

1. We have

U2 = Almost2(Streett(P )); and

U1 = S \ Almost2(Streett(P )) = s | ∃σ ∈ ΠPM . ∀π ∈ Π. Prσ,πs (Rabin(P )) > 0.


2. The set Almost2(Streett(P )) can be computed in time TwoPlRabinGame(n·d,m·d, d+1),

where TwoPlRabinGame(n·d,m·d, d+1)) the time complexity of a 2-player Rabin game

solving algorithm with n · d states, m · d, edges and d + 1 Rabin pairs.

3. If π is pure finite-memory sure winning strategy for player 2 in U2 in Tr2as(G), then

the pure finite-memory strategy π = Tr2as(π) is an almost-sure winning strategy for

player 2 in G.

Quantitative Complexity. The existence of pure memoryless almost-sure winning strate-

gies for Rabin objectives (Theorem 27) and Lemma 17 implies the existence of pure mem-

oryless optimal strategies for Rabin objectives. This gives us the following results.

Theorem 29 The family ΣPM of pure memoryless strategy suffices for optimality with

respect to all Rabin objectives on 212-player game graphs.

Theorem 30 Given a 212 -player game graph G, an objective Φ for player 1, a state s ∈ S

and a rational r ∈ R, the complexity of determining whether Val1(Φ)(s) ≥ r is as follows:

1. NP-complete if Φ is a Rabin objective.

2. coNP-complete if Φ is a Streett objective.

3. NP ∩ coNP if Φ is a parity objective.

Proof.

1. Let G be a 212 -player game with a Rabin objective Rabin(P ) for player 1. Given a

pure memoryless optimal strategy σ for player 1 the game Gσ is a player-2 MDP with

Streett objective for player 2. Since the values of MDPs with Streett objective can be

computed in polynomial time (Theorem 15) the problem is in NP. The NP-hardness

proof follows from the fact the 2-player games with Rabin objectives are NP-hard

(Theorem 2).


2. Follows immediately from the fact that Street objectives are complementary to Rabin

objectives.

3. Follows from the previous two completeness result, as a parity objective is both a

Rabin objective and a Streett objective.

Theorem 30 improves the previous 3EXPTIME bound for the quantitative analysis

for 212 -player games with Rabin and Streett objectives (Corollary3).

5.2 Strategy Improvement for 212-player Rabin and Streett

Games

We first present a few key properties of 212 -player games with Rabin objectives. We

use the properties later to develop a strategy improvement algorithm for 212 -player games

with Rabin objectives.

5.2.1 Key Properties

Boundary probabilistic states. Given a set U of states, let Bnd(U) = s ∈ U ∩

SP | ∃t ∈ E(s), t 6∈ U, be the set of boundary probabilistic states that have an edge out

of U . Given a set U of states and a Rabin objective Rabin(P ) for player 1, we define

two transformations Trwin1(U) and Trwin2(U) of U as follows: every state s in Bnd(U)

is converted to an absorbing state (state with only a self-loop) and (a) in Trwin1(U) it is

assigned the color f1 and (b) in Trwin2(U) it is assigned the color e1; i.e., every state in

Bnd(U) is converted to a sure winning state for player 1 in Trwin1(U) and every state in

Bnd(U) is converted to a sure winning state for player 2 in Trwin2(U). Observe that if U is

δ-live, then Trwin1(G U) and Trwin2(G U) is a game graph.

Value classes. Given a Rabin objective Φ, for every real r ∈ R the value class with value


r, VC(Φ, r) = s ∈ S | Val1(Φ)(s) = r, is the set of states with value r for player 1. In

sequel we will drop the parameter Φ and write VC(r) to denote VC(Φ, r). It follows from

Proposition 7 that for every r > 0, the value class VC(r) is δ-live. The following lemmas

(easily obtained as specialization of Lemma 15 and Lemma 16) establishes a connection

between value classes, the transformations Trwin1 and Trwin2 and the almost-sure winning

states.

Lemma 28 (Almost-sure winning reduction) The following assertions hold.

1. For every value class VC(r), for r > 0, the game Trwin1(G VC(r)) is almost-sure

winning for player 1.

2. For every value class VC(r), for r < 1, the game Trwin2(G VC(r)) is almost-sure

winning for player 2.

Lemma 29 (Optimal strategies) The following assertions hold.

1. If a strategy σ is an almost-winning strategy in the game Trwin1(G VC(r)), for every

value class VC(r), then σ is an optimal strategy.

2. If a strategy π is an almost-winning strategy in the game Trwin2(G VC(r)), for every

value class VC(r), then π is an optimal strategy.

It follows from Theorem 26 and Lemma 28, that for every value class VC(r), with

r > 0, the game Tr1as(Trwin1(G VC(r))) is sure winning for player 1.

Properties of almost-sure winning states. The following lemma easily follows from

Corollary 5.

Lemma 30 Given a 212-player game G and a Rabin objective Rabin(P ) if

Almost1(Rabin(P )) = ∅, then Almost2(Ω \ Rabin(P )) = S.


Property of MDPs with Streett objectives. The following lemma is a obtained as a

special case of Theorem 13.

Lemma 31 The family of randomized memoryless strategies suffices for optimality with

respect to Streett objectives on MDPs.

5.2.2 Strategy Improvement Algorithm

We now present an algorithm to compute values for 212 -player games with Rabin

objective Rabin(P ) for player 1. By quantitative determinacy (Theorem 16) the algorithm

also computes values for Streett objective Streett(P ) for player 2. Recall that since pure

memoryless strategies exist for Rabin objectives we will only consider pure memoryless

strategies σ for player 1. We refer to the Rabin objective Rabin(P ) for player 1 as Φ.

Restriction, values and value classes of strategies. Given a strategy σ and a set

U of states, we denote by (σ U) the restriction of the strategy σ on the set U , that is,

a strategy that for every state in U follows the strategy σ. Given a player-1 strategy σ

and a Rabin objective Φ, we denote the value of player 1 given the strategy σ as follows:

Valσ1 (Φ)(s) = infπ∈Π Prσ,πs (Φ). Similarly we define the value classes given strategy σ as

VCσ(r) = s ∈ S | Valσ1 (Φ)(s) = r.

Ordering of strategies. We define an ordering relation ≺ on strategies as follows: given

two strategies σ and σ′, we have σ ≺ σ′ if and only if

• for all states s we have Valσ1 (Φ)(s) ≤ Valσ′

1 (Φ)(s) and for some state s we have

Valσ1 (Φ)(s) < Valσ′

1 (Φ)(s).

Strategy improvement step. Given a strategy σ for player 1, we describe a procedure

Improve to “improve” the strategy for player 1. The procedure is described in Algorithm 3.

An informal description of the procedure is as follows: given a strategy σ, the algorithm


computes the values Valσ1 (Φ)(s) for all states. Since σ is a pure memoryless strategy,

Valσ1 (Φ)(s) can be computed by solving the MDP Gσ with the Streett objective Ω \ Φ. If

there is a state s ∈ S1 such that the strategy can be “value improved,” i.e., there is a state

t ∈ E(s) with Valσ1 (Φ)(t) > Valσ1 (Φ)(s), then the strategy σ is modified by setting σ(s)

to t. This is achieved in Step 2.1 of Improve. Otherwise in every value class VCσ(r), the

strategy σ is “improved” for the game Tr1as(Trwin2(G VCσ(r))) by solving the 2-player

game Tr1as(Trwin2(G VCσ(r))) by an algorithm to solve 2-player Rabin games.

The complexity of Improve will be discussed in Lemma 36. In the algorithm the

strategy σ for player 1 is always a pure memoryless strategy (this is sufficient, because

pure memoryless strategies suffice for optimality in 212 -player games with Rabin objectives

(Theorem 16)). Moreover, given a pure memoryless strategy σ, the game Gσ is a player-2

MDP, and by Lemma 31, there is a randomized memoryless counter-optimal strategy for

player 2. Hence, fixing a pure memoryless strategy for player 1, we only consider randomized

memoryless strategies for player 2. We now define the notion of Rabin winning set, and

then present two propositions, which are useful in the correctness proof of the algorithm.

Rabin winning set. Consider a Rabin objective Rabin(P ) and let [[P ]] =

(E1, F1), (E2, F2), . . . , (Ed, Fd) be the set of Rabin pairs. A set C ⊆ S is Rabin win-

ning if there exists 1 ≤ i ≤ d such that C ∩ Ei = ∅ and C ∩ Fi 6= ∅, i.e., for all plays ω if

Inf(ω) = C, then ω ∈ Rabin(P ).

Proposition 10 Given a strategy σ for player 1, for every state s ∈ VCσ(r) ∩ S2, if

t ∈ E(s), then we have Valσ1 (Φ)(t) ≥ r, i.e., E(s) ⊆ ⋃q≥r VCσ(q).

Proof. The result is proved by contradiction. Suppose the assertion of the proposition fails,

i.e., there exists s and t ∈ E(s), such that s ∈ VCσ(r) and Valσ1 (Φ)(t) < r, then consider the

strategy π ∈ Π for player 2 that at s chooses successor t, and from t ensures Φ is satisfied

with probability at most Valσ1 (Φ)(t) against strategy σ. Hence we have Valσ1 (Φ)(s) ≤


Valσ1 (Φ)(t) < r. This contradicts that s ∈ VCσ(r). Hence player 2 can only choose edges

with the target of the edge in equal or higher value classes.

Proposition 11 Given a strategy σ for player 1, for all strategies π ∈ ΠM for player 2, if

there is a closed connected recurrent class C in the Markov chain Gσ,π, with C ⊆ VCσ(r),

for r > 0, then C is Rabin winning.

Proof. The result is again proved by contradiction. Suppose the assertion of the proposition

fails, i.e., for some strategy π ∈ ΠM for player 2, for some r > 0, C is a closed connected

recurrent class in the Markov chain Gσ,π, with C ⊆ VCσ(r) and C is not Rabin winning.

Then player 2 by playing strategy π ensures that for all states s ∈ C we have Prσ,πs (Φ) = 0

(since C is not Rabin winning and given C is a closed connected recurrent class, all states

in C are visited infinitely often). This contradicts that C ⊆ VCσ(r) and r > 0.

Lemma 32 Consider a strategy σ to be an input to Algorithm 3, and let σ′ be an output,

i.e., σ′ = Improve(G,σ). If the set I in Step 2 of Algorithm 3 is non-empty, then we have

Valσ′

1 (Φ)(s) ≥ Valσ1 (Φ)(s) ∀s ∈ S; Valσ′

1 (Φ)(s) > Valσ1 (Φ)(s) ∀s ∈ I.

Proof. Consider a switch of the strategy of player 1 from σ to σ′, as constructed in Step 2.1

of Algorithm 3. Consider a strategy π ∈ ΠM for player 2 and a closed connected recurrent

class C in Gσ′,π such that C ⊆ ⋃r>0 VCσ(r). Let z = maxr > 0 | C ∩ VCσ(r) 6= ∅,

that is, VCσ(z) is the greatest value class with a nonempty intersection with C. A state

s ∈ VCσ(z) ∩ C satisfies the following conditions:

1. If s ∈ S2, then for all t ∈ E(s) if π(s)(t) > 0, then t ∈ VCσ(z). This follows, because

by Proposition 10, we have E(s) ⊆ ⋃q≥z VCσ(q) and C ∩ VCσ(q) = ∅ for q > z.

2. If s ∈ S1, then σ′(s) ∈ VCσ(z). This follows, because by construction σ′(s) ∈⋃

q≥z VCσ(q) and C ∩ VCσ(q) = ∅ for q > z. Also, since s ∈ VCσ(z) and

σ′(s) ∈ VCσ(z), it follows that σ′(s) = σ(s).


3. If s ∈ SP , then E(s) ⊆ VCσ(z). This follows, because for s ∈ SP , if E(s) ( VCσ(z),

then E(s) ∩ ⋃q>z VCσ(q) 6= ∅. Since C is closed, and C ∩ VCσ(q) = ∅ for q > z, the

claim follows.

It follows that C ⊆ VCσ(z), and for all states s ∈ C ∩ S1, we have σ′(s) = σ(s). Hence, by

Proposition 11, we conclude that C is Rabin winning.

It follows that if player 1 switches to the strategy σ′, as constructed when Step 2.1

of Algorithm 3 is executed, then for all strategies π ∈ ΠM for player 2 the following assertion

holds: if there is a closed connected recurrent class C ⊆ S \ VCσ(0) in the Markov chain

Gσ′,π, then C is Rabin winning for player 1. Hence given strategy σ′, a counter-optimal

strategy for player 2 maximizes the probability to reach VCσ(0). We now analyze the

player-2 MDP Gσ′ with the reachability objective VCσ(0) to establish the desired claim.

For simplicity we consider E ∩ (SP × SP ) = ∅, i.e., a state s ∈ SP has edges to S1 and S2

states only ((E(s) ⊆ S1 ∪ S2). We consider the following variables ws for s ∈ (S \ SP ) as

follows:

ws =

1 − Valσ1 (Φ)(s) s ∈ (S \ I) \ SP ;

1 − Valσ1 (Φ)(σ′(s)) s ∈ I.

Observe that for s ∈ I we have ws > 1 − Valσ1 (Φ)(s). We now define variables xs for s ∈ S

as follows:

xs =

ws s ∈ S1 ∪ S2;

∑t∈E(s) δ(s)(t) · ws s ∈ SP .

Observe that

xs ≤ ws

≥ 1 − Valσ1 (Φ)(s) s ∈ S \ I;

> 1 − Valσ1 (Φ)(s) s ∈ I.


The variables satisfy the following constraints

xs ≥ xt s ∈ S2, (s, t) ∈ E;

xs =∑

t∈E(s) δ(s)(t) · xt s ∈ SP ;

xs = xσ′(s) s ∈ S1;

xs = 1 s ∈ VCσ(0).

Since the value for player 2 to reach VCσ(0) in the MDP Gσ′ is the least value vector to

satisfy the above constraints (see Theorem 11), it follows that the value for player 2 to reach

VCσ(0) at a state s is at most xs, i.e., for all π ∈ Π we have Prσ′,πs (Reach(VCσ(0)) ≤ xs.

Thus we obtain that for all s ∈ S and for all strategies π ∈ Π we have Prσ′,π

s (Φ) ≤ xs.

Hence we have Valσ′

1 (Φ)(s) ≥ 1 − xs and the desired result follows.

Lemma 33 Consider a strategy σ to be an input to Algorithm 3, and let σ′ be an output,

i.e., σ′ = Improve(G,σ), such that σ′ 6= σ. If the set I in Step 2 of Algorithm 3 is empty,

then

1. for all states s we have Valσ′

1 (Φ)(s) ≥ Valσ1 (Φ)(s); and

2. for some state s we have Valσ′

1 (Φ)(s) > Valσ1 (Φ)(s).

Proof. It follows from Proposition 11 that for all strategies π ∈ ΠM for player 2, if C is

a closed connected recurrent class in Gσ,π and C ⊆ VCσ(q), for q > 0, then C is Rabin

winning. Let σ′ be the strategy constructed from σ in Step 2.2 of Algorithm 3. The

set Ur where σ is modified to obtain σ′, the strategy σ′ Ur is an almost-sure winning

strategy in Ur. in the subgame Trwin2(G VCσ(r)) This follows from Theorem 26 since

σ′ Ur = Tr1as(σ U r) and σ U r is a sure winning strategy for player 1 in U r in the

subgame Tr1as(Trwin2(G VCσ(r))). It follows that if C is a closed connected recurrent

class in Gσ′,π and C ⊆ Ur, then C is Rabin winning. Arguments similar to Lemma 32

shows that the following assertion hold: for all strategies π ∈ ΠM for player 2, if there


is a closed connected recurrent class C ⊆ (S \ VCσ(0)) in the Markov chain Gσ′,π, then

either (a) C ⊆ VCσ(z) \ Ur, for some z > 0; or (b) C ⊆ Ur; in both cases C is Rabin

winning. Given the strategy σ′ player 2 we consider the MDP Gσ′ and fix a randomized

memoryless optimal strategy π for player 2 in the player-2 MDP Gσ′ . We have the following

case analysis.

1. If for all states s ∈ Ur ∩ S2 we have Supp(π(s)) ⊆ Ur, then since σ′ Ur is an

almost-sure winning strategy for player 1 it follows that for all states s ∈ Ur we have

Prσ,πs (Φ) = 1. Since r < 1, for all s ∈ Ur VCσ(Φ)(s) = r < 1, and VCσ′

(Φ)(s) = 1 >

r = VCσ(Φ)(s). The desired claim of the lemma easily follows in this case.

2. Otherwise, there exists a state s ∈ Ur such that Supp(π(s)) ∩ (S \ Ur) 6= ∅. In the

present case we have

Supp(π(s)) ⊆⋃

q≥r

VCσ(q); Supp(π(s)) ∩⋃

q>r

VCσ(q) 6= ∅. (5.1)

Observe that for all s ∈ S1, if s ∈ VCσ(q), then σ′(s) ∈ VCσ(q). We now consider

variable ws for s ∈ S1 ∪ S2 as follows:

ws =

1 − Valσ1 (Φ)(s) s ∈ S1;

1 − (∑

t∈E(s) π(s)(t) · Valσ1 (Φ)(t)) s ∈ S2.

We now consider variables xs for s ∈ S as follows:

xs =

ws s ∈ S1 ∪ S2;

∑t∈E(s) δ(s)(t) · wt s ∈ SP .

It follows that

xs ≥ 1 − Valσ1 (Φ)(s), for all s ∈ S; xs < 1 − Valσ1 (Φ)(s), for some s ∈ S;


The variables satisfy the following constraints

xs =∑

t∈E(s) π(s)(t) · xt s ∈ S2;

xs =∑

t∈E(s) δ(s)(t) · xt s ∈ SP ;

xs = xσ′(s) s ∈ S1;

xs = 1 s ∈ VCσ(0).

Since the value for player 2 to reach VCσ(0) in the Markov chain Gσ′,π is the least

value vector to satisfy the above constraints (follows as a special case of Theorem 11),

it follows that the value for player 2 to reach VCσ(0) at a state s is at most xs, i.e.,

Prσ′,πs (Reach(VCσ(0)) ≤ xs. Thus we obtain that for all s ∈ S we have Prσ′,π

s (Φ) ≤ xs.

Since π is an optimal strategy against σ′ we have Valσ′

1 (Φ)(s) ≥ 1 − xs.


Lemma 32 and Lemma 33 yields Lemma 34.

Lemma 34 For a strategy σ, if σ 6= Improve(G,σ), then σ ≺ Improve(G,σ).

Lemma 35 If σ = Improve(G,σ), then σ is an optimal strategy for player 1.

Proof. Let σ be a strategy such that σ = Improve(G,σ). Then the following conditions

hold.

1. Fact 1. The strategy σ cannot be “value-improved”, that is

∀s ∈ S1. ∀t ∈ E(s). Valσ1 (Φ)(t) ≤ Valσ1 (Φ)(s);

and for all s ∈ S1 we have Valσ1 (Φ)(s) = Valσ1 (Φ)(σ(s)).

2. Fact 2. For all r < 1, the set of almost-sure winning states in Trwin2(G VCσ(r))

for player 1 is empty. By Lemma 30 it follows that for all r < 1, all states in

Trwin2(G VCσ(r)) are almost-sure winning for player 2.


Consider a finite-memory strategy π, with memory M, for player 2 such that the strategy π

is almost-sure winning in Trwin2(G VCσ(r)) for all r < 1. Let U<1 = S \VCσ(1). Consider

a pure memoryless strategy σ′ of player 1. Consider the Markov chain (G × M)σ′,π and

closed connected recurrent set C in the Markov chain. It follows from arguments similar

to Lemmas 32 that the set C is contained in some value class and do not intersect with

the boundary probabilistic states, i.e., for some r ∈ [0, 1] we have C ⊆ (VCσ(r) × M) and

C ∩ (Bnd(r) × M) = ∅. By the property of the almost-sure winning of π it follows that for

r < 1 we have C is almost-sure winning for player 2, i.e., if r < 1, then for all s ∈ C the

probability of satisfying Φ in the Markov chain is 0. Hence for all states s ∈ U<1, we have

Prσ′,π

s (Φ | Safe(U<1)) = 0, where Safe(U<1) = ω = 〈s0, s1, . . .〉 | ∀k ≥ 0. sk ∈ U<1 denotes

the set of plays that only visit states in U<1. Hence, given the strategy π, any counter-

optimal pure memoryless strategy for player 1 maximizes the probability to reach VCσ(1) in

the MDP (G×M)π. From the fact that the strategy σ cannot be “value improved” (Fact 1)

and argument similar to Lemma 32 (to analyze reachability in MDPs), it follows that that

for all player-1 pure memoryless strategies σ′, all r < 1, and all states s ∈ VCσ(r), we have

Prσ′,π

s (Φ) ≤ r. Since pure memoryless optimal strategies exist for player 1, it follows that

for all r ∈ [0, 1] and all states s ∈ VCσ(r), we have Val1(Φ)(s) ≤ r. For all r ∈ [0, 1] and all

states s ∈ VCσ(r), we have r = Valσ1 (Φ)(s) ≤ Val1(Φ)(s). This establishes the optimality

of σ.

Lemma 36 The procedure Improve can be computed in time

O(poly(n)) + n · O(TwoPlRabinGame(n · d,m · d, d + 1)),

where poly is a polynomial function.

In Lemma 36 we denote by O(TwoPlRabinGame(n · d,m · d, d + 1)) the time com-

plexity of an algorithm for solving 2-player Rabin games with n · d states, m · d edges, and


d+1 Rabin pairs. Recall the reduction Tr1as blows up states in and outgoing edges from SP

by a factor of d, and adds a new Rabin pair. A call to Improve requires solving an MDP

with Streett objectives quantitatively (Step 1 of Improve; this can be achieved in polyno-

mial time by Theorem 15), and computing Step 2.2 requires to solve at most n two-player

Rabin games (since there can be at most n value classes). Lemma 36 follows. Also recall

that by the results of [PP06] we have

O(TwoPlRabinGame(n ·d,m ·d, d+1)) = O((m ·d) · (n ·d)d+2 · (d+1)!

)= O

(m ·nd+2 ·dd+3

)

A strategy-improvement algorithm using the Improve procedure is described in

Algorithm 4. Observe that it follows from Lemma 34 that, if Algorithm 4 outputs a strategy

σ∗, then σ∗ = Improve(G,σ∗). The correctness of the algorithm follows from Lemma 35

and yields Theorem 31. Given an optimal strategy σ for player 1, the values for both the

players can be computed in polynomial time by computing the values of the MDP Gσ (see

Theorem 15). Since there are at most(

mn )n ≤ 2n·log n possible pure memoryless strategies,

it follows that Algorithm 4 requires at most 2n·log n iterations. This along with Lemma 36

gives us the following theorem.

Theorem 31 (Correctness of Algorithm 4) For every 212 -player game graph G and

Rabin objective Φ, the output σ∗ of Algorithm 4 is an optimal strategy for player 1. The

running time of Algorithm 4 is bounded by 2O(n·log n) ·O(m · nd+2 · d(d+3)

)if G has n states

and m edges, and Φ has d pairs.

5.3 Randomized Algorithm

We now present a randomized algorithm for 212 -player Rabin games, by combining

an algorithm of Bjorklund et al. [BSV03] and the procedure Improve.

Games and improving subgames. Given l,m ∈ N, let G(l,m) be the class of 212 -player game

graphs with the set S1 of player 1 states partitioned into two sets as follows: (a) O1 = s ∈


S1 | |E(s)| = 1, i.e., the set of states with out-degree 1; and (b) O2 = S2 \O1, with O2 ≤ l

and∑

s∈O2|E(s)| ≤ m. There is no restriction for player 2. Given a game G ∈ G(l,m), a

state s ∈ O2, and an edge e = (s, t), we define the subgame Ge by deleting all edges from s

other than the edge e. Observe that Ge ∈ G(l−1,m−|E(s)|), and hence also Ge ∈ G(l,m).

If σ is a strategy for player 1 in G ∈ G(l,m), then a subgame G is σ-improving if some

strategy σ′ in G satisfies that σ ≺ σ′.

Informal description of Algorithm 5. The algorithm takes a 212 -player Rabin game and

an initial strategy σ0, and proceeds in three steps. In Step 1, it constructs r pairs of σ0-

improving subgames G and corresponding improved strategy σ in G. This is achieved by

the procedure ImprovingSubgames. The parameter r will be chosen to obtain a suitable

complexity analysis. In Step 2, the algorithm selects uniformly at random one of the

improving subgames G with corresponding strategy σ, and recursively computes an optimal

strategy σ∗ in G from σ as the initial strategy. If the strategy σ∗ is optimal in the original

game G, then the algorithm terminates and returns σ∗. Otherwise it improves σ∗ by a

call to Improve, and continues at Step 1 with the improved strategy Improve(G,σ∗) as the

initial strategy.

The procedure ImprovingSubgames constructs a sequence of game graphs

G0, G1, . . . , Gr−l with Gi ∈ G(l, l + i) such that all (l + i)-subgames Gie of Gi are σ0-

improving. The subgame Gi+1 is constructed from Gi as follows: we compute an optimal

strategy σi in Gi, and if σi is optimal in G, then we have discovered an optimal strategy;

otherwise we construct Gi+1 by adding any target edge e of Improve(G,σi) in Gi, i.e., e is

an edge required in the strategy Improve(G,σi) that is not in the strategy σi.

The correctness of the algorithm can be seen as follows. Observe that every time

Step 1 is executed, the initial strategy is improved with respect to the ordering ≺ on

strategies. Since the number of strategies is bounded, the termination of the algorithm is

guaranteed. Step 3 of Algorithm 5 and Step 1.2.1 of procedure ImprovingSubgames ensure


that on termination of the algorithm, the returned strategy is optimal. Lemma 37 bounds

the expected number of iterations of Algorithm 5. The analysis is similar to the results of

[BSV03].

Lemma 37 Algorithm 5 computes an optimal strategy. The expected number of iterations

T (·, ·) of Algorithm 5 for a game G ∈ G(l,m) is bounded by the following recurrence:

T (l,m) ≤r∑

i=l

T (l, i) + T (l − 1,m − 2) +1

r·

r∑

i=1

T (l,m − i) + 1.

Proof. We justify every term of the right hand side of the recurrence. The first term

represent the work by procedure ImprovingSubgames by recursive calls to Algorithm 5 to

compute r pairs of σ0-improving sub-games and witnesses. The second term represents the

work of the recursive call at Step 2 of Algorithm 5. The third term represents the work

as the average of the r equally likely choices in Step 3 of Algorithm 5. All the sub-games

Gi can be partially ordered according to the values of the optimal strategies in Gi. Since

the algorithm only visits strategies that are improving w.r.t. the ≺ ordering, it follows that

sub-games that have equal, worse or incomparable optimal strategy, to the strategy σ∗ will

never be explored in the rest of the algorithm. In the worst case the algorithm selects the

worst r sub-games and the Step 3 solves a game G ∈ G(l,m − i), for i = 1, 2, . . . , r, each

with probability 1r . This gives the bound for the recurrence.

For a game graph G with |S| = n, we obtain a bound of n2 for m. Using this

fact and an analysis of Kalai for linear programming, Bjorklund et al. [BSV03] showed that

mO(√

n/ log(n))

= 2O(√

n·log (n))

is a solution to the recurrence of Lemma 37, by choosing

r = maxn, m2 . The above analysis along with Lemma 36 yields Theorem 32.

Theorem 32 Given a 212 -player game graph G and a Rabin objective Rabin(P ) with d-pairs

the value Val1(Rabin(P ))(s) can be computed for all states s ∈ S in expected time

2O(√

n·log(n))· O(TwoPlRabinGame(n · d,m · d, d + 1)) = 2O

(√n·log(n)

)· O

(nd+2 · dd+3 · m

).


5.4 Optimal Strategy Construction for Streett Objectives

The algorithms, Algorithm 4 and the randomized algorithm, compute values for

both player 1 and player 2 (i.e., both for Rabin and Streett objectives), but only construct

an optimal strategy for player 1 (i.e., the player with the Rabin objective). Since pure

memoryless optimal strategies exist for the Rabin player, it is much simpler to analyze and

obtain the values and an optimal strategy for player 1. We now show that how, once these

values have been computed, we can obtain an optimal strategy for the Streett player as well.

We do this by computing sure winning strategies in 2-player games with Streett objectives.

Given a 212 -player game G with Rabin objective Φ for player 1, and the comple-

mentary objective Ω \ Φ for player 2, first we compute Val1(Φ)(s) for all states s ∈ S. An

optimal strategy π∗ for player 2 is constructed as follows: for a value class VC(r), for r < 1,

obtain a sure winning strategy πr for player 2 in Tr2as(Trwin2(G VC(r))), and in VC(r) the

strategy π∗ follows the strategy Tr2as(πr). By Lemma 29, it follows that π∗ is an optimal

strategy, and given all values, the construction of π∗ requires n calls to a procedure for

solving 2-player games with Streett objectives.

Theorem 33 Let G be a 212 -player game graph with n states and m edges, and let Φ

and Ω \ Φ be a Rabin and Streett objective, respectively, with d pairs. Given the values

Val1(Φ)(s) = 1−Val2(Φ)(s) for all states s of G, an optimal strategy π∗ for player 2 can be

constructed in time n·O(TwoPlStreettGame(n·d,m·d, d+1)), where TwoPlStreettGame(n·

d,m · d, d + 1) is any algorithm for solving 2-player Streett games with n · d states, m · d

edges, and d + 1 Streett pairs.

Discussion on parity games. We briefly discuss the special case for parity games, and

then summarize the results. For the special case of 212 -player games with parity objectives an

improved strategy improvement algorithm (where the improvement step can be computed

in polynomial time) is given in [CH06a]. We summarize the complexity of strategies and


Table 5.1: Strategy complexity of 212 -player games and its sub-classes with ω-regular objec-

tives, where ΣPM denotes the family of pure memoryless strategies, ΣPF denotes the familyof pure finite-memory strategies and ΣM denotes the family of randomized memorylessstrategies.

Objectives 1-pl. 112 -pl. 2-pl. 21

2 -pl.

Reachability ΣPM ΣPM ΣPM ΣPM

/Safety

Parity ΣPM ΣPM ΣPM ΣPM

Rabin ΣPM ΣPM ΣPM ΣPM

Streett ΣPF / ΣM ΣPF / ΣM ΣPF ΣPF

Muller ΣPF / ΣM ΣPF / ΣM ΣPF ΣPF

Table 5.2: Computational complexity of 212 -player games and its sub-classes with ω-regular

objectives.

1-pl. 112 -pl. 2-pl. 21

2 -pl.

Objectives Quan. Qual. Quan.

Reachability PTIME PTIME PTIME PTIME NP ∩ coNP/Safety

Parity PTIME PTIME NP ∩ coNP NP ∩ coNP NP ∩ coNP

Rabin PTIME PTIME NP-compl. NP-compl. NP-compl.

Streett PTIME PTIME coNP-compl. coNP-compl. coNP-compl.

Muller PTIME PTIME PSPACE-compl. PSPACE-compl. PSPACE-compl.

the computational complexity of 212 -player games with Muller objectives and its subclasses

in Table 5.1 and Table 5.2.

5.5 Conclusion

We conclude the chapter stating the major open problems in the complexity anal-

ysis of 212 -player games and its subclasses. The open problems are as follows:

1. to obtain polynomial time algorithm for 2-player parity games;

2. to obtain polynomial time algorithm for quantitative analysis of 212 -player reachability

games; and


3. to obtain polynomial time algorithm for quantitative analysis of 212 -player parity

games.

All the above problem are in NP ∩ coNP, and no polynomial time algorithm is known for

any of the above problems.


Algorithm 3 Improve

Input: A 212 -player game graph G, a Rabin objective Φ for player 1,

and a pure memoryless strategy σ for player 1.

Output: A pure memoryless strategy σ′ for player 1 such that σ′ = σ or σ ≺ σ′.

[Step 1] Compute Valσ1 (Φ)(s) for all states s.

[Step 2] Consider the set I = s ∈ S1 | ∃t ∈ E(s). Valσ1 (Φ)(t) > Valσ1 (Φ)(s).

2.1 (Value improvement) if I 6= ∅ then choose σ′ as follows:

σ′(s) = σ(s) for s ∈ S1 \ I; and

σ′(s) = t for s ∈ I, where t ∈ E(s) such that Valσ1 (Φ)(t) > Valσ1 (Φ)(s).

2.2 (Qualitative improvement) else

for each value class VCσ(r) with r < 1 do

Let Gr be the 2-player game Tr1as(Trwin2(G VCσ(r))).

Let U r be the sure winning states for player 1 in Gr;

let Ur the corresponding set in G; and

let σr be the sure winning strategy for player 1 in U r.

Choose σ′(s) = Tr1as(σr U r)(s) for all states in Ur; and

σ′(s) = σ(s) for all states in VCσ(r) \ Ur .

return σ′.


Algorithm 4 StrategyImprovementAlgorithm

Input: A 212 -player game graph G and a Rabin objective Φ for player 1.

Output: An optimal strategy σ∗ for player 1.

1. Choose an arbitrary pure memoryless strategy σ for player 1.

2. while σ 6= Improve(G,σ) do σ = Improve(G,σ).

3. return σ∗ = σ.


Algorithm 5 RandomizedAlgorithm (212-player Rabin games)

Input: a 212 -player game graph G ∈ G(l,m), a Rabin objective Rabin(P ) for pl. 1

and an initial strategy σ0 for pl. 1.

Output : an optimal strategy σ∗ for player 1.

1. (Step 1) Collect a set I of r pairs (G, σ) of subgames G of G, and

corresponding strategies σ in G such that σ0 ≺ σ.

(This is achieved by the procedure ImprovingSubgames below).

2. (Step 2) Select a pair (G, σ) from I uniformly at random.

2.1 Find an optimal strategy in σ∗ ∈ G by applying the algorithm recursively,

with σ as the initial strategy.

3. (Step 3) if σ∗ is an optimal strategy in the original game G then return σ∗.

else let σ = Improve(G,σ∗), and

goto Step 1 with G and σ as the initial strategy.

procedure ImprovingSubgames

1. Construct sequence G0, G1, . . . , Gr−l of subgames with Gi ∈ G(l, l + i) as follows:

1.1 G0 is the game where each edge is fixed according to σ0.

1.2 Let σi be an optimal strategy in Gi;

1.2.1 if σi is an optimal strategy in the original game G

then return σi.

1.2.2 else let e be any target of Improve(G,σi);

the subgame Gi+1 is Gi with the edge e added.

2. return r subgames (fixing one of the r edges in Gr−l) and associated strategies.

135

Chapter 6

Concurrent Reachability Games

In this chapter we present two results1. First, we present a simple proof of the fact

that in concurrent reachability games, for all ε > 0, memoryless ε-optimal strategies exist.

A memoryless strategy is independent of the history of plays, and an ε-optimal strategy

achieves the objective with probability within ε of the value of the game. In contrast to

previous proofs of this fact, which rely on the limit behavior of discounted games using

advanced Puisieux series analysis, our proof is elementary and combinatorial. Second, we

present a strategy-improvement (a.k.a. policy-iteration) algorithm for concurrent games

with reachability objectives.

It has long been known that optimal strategies need not exist for concurrent reach-

ability games [Eve57], so that one must settle for ε-optimality. It was also known that, for

ε > 0, there exist ε-optimal strategies that are memoryless, i.e., strategies that always

choose a probability distribution over moves that depends only on the current state, and

not on the past history of the play [FV97]. Unfortunately, the only previous proof of this

fact is rather complex. The proof considered discounted versions of reachability games,

where a play that reaches the target in k steps is assigned a value of αk, for some discount

1Preliminary version of this chapter appeared in [CdAH06b]

CHAPTER 6. CONCURRENT REACHABILITY GAMES 136

factor 0 < α ≤ 1, rather than value 1. It is possible to show that, for 0 < α < 1, memoryless

optimal strategies always exist. The result for the undiscounted (α = 1) case followed from

an analysis of the limit behavior of such optimal strategies for α → 1. The limit behavior is

studied with the help of results on the field of real Puisieux series [FV97]. This proof idea

works not only for reachability games, but also for total-reward games with nonnegative

rewards (see [FV97] again). A recent result [EY06] established the existence of memory-

less ε-optimal strategies for certain infinite-state (recursive) concurrent games; the proof

relies on results from analysis and analytic property of certain power series. We show that

the existence of memoryless ε-optimal strategies for concurrent reachability games can be

established by more elementary means. Our proof relies only on combinatorial techniques

and on simple properties of Markov decision processes [Ber95, dA97]. As our proof is easily

accessible, we believe that the proof techniques we use will find future applications in game

theory.

Our proof of the existence of memoryless ε-optimal strategies, for all ε > 0, is built

upon a value-iteration scheme that converges to the value of the game [dAM01]. The value-

iteration scheme computes a sequence u0, u1, u2, . . . of valuations, where for i = 0, 1, 2, . . .

each valuation ui associates with each state s of the game a lower bound ui(s) on the

value of the game, such that limi→∞ ui(s) converges to the value of the game at s. From

each valuation ui, we can easily extract a memoryless, randomized player-1 strategy, by

considering the (randomized) choice of moves for player 1 that achieves the maximal one-

step expectation of ui. In general, a strategy σi obtained in this fashion is not guaranteed

to achieve the value ui. We show that σi is guaranteed to achieve the value ui if it is

proper, that is, if regardless of the strategy adopted by player 2, the game reaches with

probability 1 states that are either in the target, or that have no path leading to the target.

Next, we show how to extract from the sequence of valuations u0, u1, u2, . . . a sequence of

memoryless randomized player-1 strategies σ0, σ1, σ2, . . . that are guaranteed to be proper,


and thus achieve the values u0, u1, u2, . . .. This proves the existence of memoryless ε-optimal

strategies for all ε > 0.

We then apply the techniques developed for the above proof to develop a strategy-

improvement algorithm for concurrent reachability games. Strategy-improvement algo-

rithms, also known as policy iteration algorithms in the context of Markov decision processes

[Der70, Ber95], compute a sequence of memoryless strategies σ′0, σ

′1, σ

′2, . . . such that, for all

k ≥ 0, (i) the strategy σ′k+1 is at all states no worse than σ′

k; (ii) if σ′k+1 = σ′

k, then σk is

optimal; and (iii) for every ε > 0, we can find a k sufficiently large so that σ′k is ε-optimal.

Computing a sequence of strategies σ0, σ1, σ2, . . . on the basis the value-iteration scheme

from above does not yield a strategy-improvement algorithm, as condition (ii) may be vio-

lated: there is no guarantee that a step in the value iteration leads to an improvement in the

strategy. We will show that the key to obtain a strategy-improvement algorithm consists in

recomputing, at each iteration, the values of the player-1 strategy to be improved, and in

adopting a particular strategy-update rule, which ensures that all the strategies produced

are proper. Unlike previous proofs of strategy-improvement algorithms for concurrent games

[Con93, FV97], which relied on the analysis of discounted versions of the games, our analysis

is again purely combinatorial. Differently from turn-based games [Con93], for concurrent

games we cannot guarantee the termination of the strategy-improvement algorithm. In fact,

there are games where optimal strategies do not exist, and we can guarantee the existence

of only ε-optimal strategies, for all ε > 0 [Eve57, dAHK98].


6.1 Preliminaries

Destinations of selectors and their memoryless strategies. Given a state s, and

selectors ξ1 and ξ2 for the two players, we denote by

Succ(s, ξ1, ξ2) =⋃

a1∈Supp(ξ1(s)),

a2∈Supp(ξ2(s))

Succ(s, a1, a2)

the set of possible successors of s with respect to the selectors ξ1 and ξ2. We write ξ for the

memoryless strategy consisting in playing forever the selector ξ.

Valuations. A valuation is a mapping v : S → [0, 1] associating a real number v(s) ∈ [0, 1]

with each state s. Given two valuations v,w : S → R, we write v ≤ w when v(s) ≤ w(s)

for all states s ∈ S. For an event A, we denote by Prσ,π(A) the valuation S → [0, 1] defined

for all states s ∈ S by(Prσ,π(A)

)(s) = Prσ,π

s (A). Similarly, for a measurable function

f : Ωs → [0, 1], we denote by Eσ,π(f) the valuation S → [0, 1] defined for all s ∈ S by

(Eσ,π(f)

)(s) = E

σ,πs (f).

Given a valuation v, and two selectors ξ1 ∈ Λ1 and ξ2 ∈ Λ2, we define the valuations

Preξ1,ξ2(v), Pre1:ξ1(v), and Pre1(v) as follows, for all states s ∈ S:

Preξ1,ξ2(v)(s) =∑

a,b∈A

∑t∈S v(t) · δ(s, a, b)(t) · ξ1(s)(a) · ξ2(s)(b);

Pre1:ξ1(v)(s) = infξ2∈Λ2 Preξ1,ξ2(v)(s);

Pre1(v)(s) = supξ1∈Λ1infξ2∈Λ2 Preξ1,ξ2(v)(s).

Intuitively, Pre1(v)(s) is the greatest expectation of v that player 1 can guarantee at a suc-

cessor state of s. Also note that given a valuation v, the computation of Pre1(v) reduces to

the solution of a zero-sum one-shot matrix game, and can be solved by linear programming.

Similarly, Pre1:ξ1(v)(s) is the greatest expectation of v that player 1 can guarantee at a

successor state of s by playing the selector ξ1. Note that all of these operators on valuations


are monotonic: for two valuations v,w, if v ≤ w, then for all selectors ξ1 ∈ Λ1 and ξ2 ∈ Λ2,

we have Preξ1,ξ2(v) ≤ Preξ1,ξ2(w), Pre1:ξ1(v) ≤ Pre1:ξ1(w), and Pre1(v) ≤ Pre1(w).

6.2 Markov Decision Processes of Memoryless Strategies

To develop our arguments, we need some facts about one-player versions of concurrent

stochastic games, known as Markov decision processes (MDPs) [Der70, Ber95]. For i ∈

1, 2, a player-i MDP (for short, i-MDP) is a concurrent game where, for all states s ∈

S, we have |Γ3−i(s)| = 1. Given a concurrent game G, if we fix a memoryless strategy

corresponding to selector ξ1 for player 1, the game is equivalent to a 2-MDP Gξ1 with the

transition function

δξ1(s, a2)(t) =∑

a1∈Γ1(s)

δ(s, a1, a2)(t) · ξ1(s)(a1),

for all s ∈ S and a2 ∈ Γ2(s). Similarly, if we fix selectors ξ1 and ξ2 for both players in a

concurrent game G, we obtain a Markov chain, which we denote by Gξ1,ξ2.

End components. In an MDP, the sets of states that play an equivalent role to the closed

recurrent classes of Markov chains [Kem83] are called “end components” [CY95, dA97].

Definition 15 (End components) An end component of an i-MDP G, for i ∈ 1, 2, is

a subset C ⊆ S of the states such that there is a selector ξ for player i so that C is a closed

recurrent class of the Markov chain Gξ.

It is not difficult to see that an equivalent characterization of an end component C is the

following. For each state s ∈ C, there is a subset Mi(s) ⊆ Γi(s) of moves such that:

1. (closed) if a move in Mi(s) is chosen by player i at state s, then all successor states

that are obtained with nonzero probability lie in C; and


2. (recurrent) the graph (C,E), where E consists of the transitions that occur with

nonzero probability when moves in Mi(·) are chosen by player i, is strongly connected.

The following theorem states that in a 2-MDP, for every strategy of player 2, the set of

states that are visited infinitely often is, with probability 1, an end component. Corollary 6

follows easily from Theorem 34.

Theorem 34 [dA97] For a player-1 selector ξ1, let C be the set of end components of a 2-

MDP Gξ1 . For all player-2 strategies π and all states s ∈ S, we have Prξ1,πs (Muller(C)) = 1.

Corollary 6 For a player-1 selector ξ1, let C be the set of end components of a 2-MDP Gξ1 ,

and let Z =⋃

C∈C C be the set of states of all end components. For all player-2 strategies

π and all states s ∈ S, we have Prξ1,πs (Reach(Z)) = 1.

MDPs with reachability objectives. Given a 2-MDP with a reachability objective

Reach(T ) for player 2, where T ⊆ S, the values can be obtained as the solution of a linear

program [FV97]. The linear program has a variable x(s) for all states s ∈ S, and the

objective function and the constraints are as follows:

min∑

s∈S

x(s) subject to

x(s) ≥∑

t∈S

x(t) · δ(s, a2)(t) for all s ∈ S and a2 ∈ Γ2(s)

x(s) = 1 for all s ∈ T

0 ≤ x(s) ≤ 1 for all s ∈ S

The correctness of the above linear program to compute the values follows from [Der70,

FV97].


6.3 Existence of Memoryless ε-Optimal Strategies

In this section we present an elementary proof of the existence of memoryless ε-optimal

strategies for concurrent reachability games, for all ε > 0 (optimal strategies need not exist

for concurrent games with reachability objectives [Eve57]). A proof of the existence of

memoryless optimal strategies for safety games can be found in [dAM01].

6.3.1 From value iteration to selectors

Consider a reachability game with target T ⊆ S. Let W2 = s ∈ S | Val1(Reach(T ))(s) = 0

be the set of states from which player 1 cannot reach the target with positive probability.

From [dAH00], we know that this set can be computed as W2 = limk→∞ W k2 , where W 0

2 =

S \ T , and for all k ≥ 0,

W k+12 = s ∈ S \ T | ∃a2 ∈ Γ2(s). ∀a1 ∈ Γ1(s). Succ(s, a1, a2) ⊆ W k

2 .

The limit is reached in at most |S| iterations. Note that player 2 has a strategy that confines

the game to W2, and that consequently all strategies are optimal for player 1, as they realize

the value 0 of the game in W2. Therefore, without loss of generality, in the remainder we

assume that all states in W2 and T are absorbing.

Our first step towards proving the existence of memoryless ε-optimal strategies for

reachability games consists in considering a value-iteration scheme for the computation of

Val1(Reach(T )). Let [T ] : S → [0, 1] be the indicator function of T , defined by [T ](s) = 1

for s ∈ T , and [T ](s) = 0 for s 6∈ T . Let u0 = [T ], and for all k ≥ 0, let

uk+1 = Pre1(uk). (6.1)

Note that the classical equation assigns uk+1 = [T ]∨Pre1(uk), where ∨ is interpreted as the

maximum in pointwise fashion. Since we assume that all states in T are absorbing, the clas-

sical equation reduces to the simpler equation given by (6.1). From the monotonicity of Pre1


it follows that uk ≤ uk+1, that is, Pre1(uk) ≥ uk, for all k ≥ 0. The result of [dAM01] estab-

lishes by a combinatorial argument that Val1(Reach(T )) = limk→∞ uk, where the limit is

interpreted in pointwise fashion. For all k ≥ 0, let the player-1 selector ζk be a value-optimal

selector for uk, that is, a selector such that Pre1(uk) = Pre1:ζk(uk). An ε-optimal strategy

σk for player 1 can be constructed by applying the sequence ζk, ζk−1, . . . , ζ1, ζ0, ζ0, ζ0, . . . of

selectors, where the last selector, ζ0, is repeated forever. It is possible to prove by induction

on k that

infπ∈Π

Prσk ,π(∃j ∈ [0..k].Xj ∈ T ) ≥ uk.

As the strategies σk, for k ≥ 0, are not necessarily memoryless, this proof does not suffice for

showing the existence of memoryless ε-optimal strategies. On the other hand, the following

example shows that the memoryless strategy ζk does not necessarily guarantee the value

uk.

Example 6 Consider the 1-MDP shown in Fig 6.1. At all states except s3, the set of

available moves for player 1 is a singleton set. At s3, the available moves for player 1 are

a and b. The transitions at the various states are shown in the figure. The objective of

player 1 is to reach the state s0.

We consider the value-iteration procedure and denote by uk the valuation after k

iterations. Writing a valuation u as the list of values(u(s0), u(s1), . . . , u(s4)

), we have:

u0 = (1, 0, 0, 0, 0)

u1 = Pre1(u0) = (1, 0,1

2, 0, 0)

u2 = Pre1(u1) = (1, 0,1

2,1

2, 0)

u3 = Pre1(u2) = (1, 0,1

2,1

2,1

2)

u4 = Pre1(u3) = u3 = (1, 0,1

2,1

2,1

2)

The valuation u3 is thus a fixpoint.


a bs4 s3

s2

s1

s0

1

2

1

2

Figure 6.1: An MDP with reachability objective.

Now consider the selector ξ1 for player 1 that chooses at state s3 the move a with

probability 1. The selector ξ1 is optimal with respect to the valuation u3. However, if

player 1 follows the memoryless strategy ξ1, then the play visits s3 and s4 alternately and

reaches s0 with probability 0. Thus, ξ1 is an example of a selector that is value-optimal, but

not optimal.

On the other hand, consider any selector ξ′1 for player 1 that chooses move b at

state s3 with positive probability. Under the memoryless strategy ξ′1, the set s0, s1 of

states is reached with probability 1, and s0 is reached with probability 12 . Such a ξ′1 is thus

an example of a selector that is both value-optimal and optimal.

In the example, the problem is that the strategy ξ1 may cause player 1 to stay forever in

S \ (T ∪ W2) with positive probability. We call “proper” the strategies of player 1 that

guarantee reaching T ∪ W2 with probability 1.

Definition 16 (Proper strategies and selectors) A player-1 strategy σ is proper if for

all player-2 strategies π, and for all states s ∈ S\(T ∪W2), we have Prσ,πs (Reach(T ∪ W2)) =

1. A player-1 selector ξ1 is proper if the memoryless player-1 strategy ξ1 is proper.

We note that proper strategies are closely related to Condon’s notion of a halting game

[Con92]: precisely, a game is halting iff all player-1 strategies are proper. We can check

whether a selector for player 1 is proper by considering only the pure selectors for player 2.

Lemma 38 Given a selector ξ1 for player 1, the memoryless player-1 strategy ξ1 is


proper iff for every pure selector ξ2 for player 2, and for all states s ∈ S, we have

Prξ1,ξ2s (Reach(T ∪ W2)) = 1.

Proof. We prove the contrapositive. Given a player-1 selector ξ1, consider the 2-MDP Gξ1 .

If ξ1 is not proper, then by Theorem 34, there must exist an end component C ⊆ S\(T∪W2)

in Gξ1 . Then, from C, player 2 can avoid reaching T ∪ W2 by repeatedly applying a pure

selector ξ2 that at every state s ∈ C deterministically chooses a move a2 ∈ Γ2(s) such that

Succ(s, ξ1, a2) ⊆ C. The existence of a suitable ξ2(s) for all states s ∈ C follows from the

definition of end component.

The following lemma shows that the selector that chooses all available moves

uniformly at random is proper. This fact will be used later to initialize our strategy-

improvement algorithm.

Lemma 39 Let ξunif1 be the player-1 selector that at all states s ∈ S \ (T ∪W2) chooses all

moves in Γ1(s) uniformly at random. Then ξunif1 is proper.

Proof. Assume towards contradiction that ξunif1 is not proper. From Theorem 34, in the

2-MDP Gξunif1

there must be an end component C ⊆ S \ (T ∪ W2). Then, when player 1

follows the strategy ξunif

1 , player 2 can confine the game to C. By the definition of ξunif1 ,

player 2 can ensure that the game does not leave C regardless of the moves chosen by

player 1, and thus, for all strategies of player 1. This contradicts the fact that W2 contains

all states from which player 2 can ensure that T is not reached.

The following lemma shows that if the player-1 selector ζk computed by the value-

iteration scheme (6.1) is proper, then the player-1 strategy ζk guarantees the value uk, for

all k ≥ 0.

Lemma 40 Let v be a valuation such that Pre1(v) ≥ v and v(s) = 0 for all states s ∈ W2.

Let ξ1 be a selector for player 1 such that Pre1:ξ1(v) = Pre1(v). If ξ1 is proper, then for all

player-2 strategies π, we have Prξ1,π(Reach(T )) ≥ v.


Proof. Consider an arbitrary player-2 strategy π, and for k ≥ 0, let

vk = Eξ1,π(v(Xk)

)

be the expected value of v after k steps under ξ1 and π. By induction on k, we can prove

vk ≥ v for all k ≥ 0. In fact, v0 = v, and for k ≥ 0, we have

vk+1 ≥ Pre1:ξ1(vk) ≥ Pre1:ξ1(v) = Pre1(v) ≥ v.

For all k ≥ 0 and s ∈ S, we can write vk as

vk(s) = Eξ1,πs

(v(Xk) | Xk ∈ T

)· Pr

ξ1,πs

(Xk ∈ T

)

+ Eξ1,πs

(v(Xk) | Xk ∈ S \ (T ∪ W2)

)· Pr

ξ1,πs

(Xk ∈ S \ (T ∪ W2)

)

+ Eξ1,πs

(v(Xk) | Xk ∈ W2

)· Pr

ξ1,πs

(Xk ∈ W2

).

Since v(s) ≤ 1 when s ∈ T , the first term on the right-hand side is at most Prξ1,πs

(Xk ∈ T

).

For the second term, we have limk→∞ Prξ1,π(Xk ∈ S \ (T ∪W2)

)= 0 by hypothesis, because

Prξ1,π(Reach(T ∪ W2)) = 1 and every state s ∈ (T ∪ W2) is absorbing. Finally, the third

term on the right hand side is 0, as v(s) = 0 for all states s ∈ W2. Hence, taking the limit

with k → ∞, we obtain

Prξ1,π(Reach(T )

)= lim

k→∞Prξ1,π

(Xk ∈ T

)≥ lim

k→∞vk ≥ v,

where the last inequality follows from vk ≥ v for all k ≥ 0. The desired result follows.

6.3.2 From value iteration to optimal selectors

Considering again the value-iteration scheme (6.1), since Val1(Reach(T )) = limk→∞ uk, for

every ε > 0 there is a k such that uk(s) ≥ uk−1(s) ≥ Val1(Reach(T ))(s) − ε at all states

s ∈ S. Lemma 40 indicates that, in order to construct a memoryless ε-optimal strategy, we

need to construct from uk−1 a player-1 selector ξ1 such that:


1. ξ1 is value-optimal for uk−1, that is, Pre1:ξ1(uk−1) = Pre1(uk−1) = uk; and

2. ξ1 is proper.

To ensure the construction of a value-optimal, proper selector, we need some definitions.

For r > 0, the value class

Ukr = s ∈ S | uk(s) = r

consists of the states with value r under the valuation uk. Similarly we define Uk⊲⊳r = s ∈ S |

uk(s) ⊲⊳ r, for ⊲⊳∈ <,≤,≥, >. For a state s ∈ S, let ℓk(s) = minj ≤ k | uj(s) = uk(s)

be the entry time of s in Ukuk(s), that is, the least iteration j in which the state s has the

same value as in iteration k. For k ≥ 0, we define the player-1 selector ηk as follows: if

ℓk(s) > 0, then

ηk(s) = ηℓk(s)(s) = arg supξ1∈Λ1

infξ2∈Λ2

Preξ1,ξ2(uℓk(s)−1);

otherwise, if ℓk(s) = 0, then ηk(s) = ηℓk(s)(s) = ξunif1 (s) (this definition is arbitrary, and

it does not affect the remainder of the proof). In words, the selector ηk(s) is an optimal

selector for s at the iteration ℓk(s). It follows easily that uk = Pre1:ηk(uk−1), that is, ηk is

also value-optimal for uk−1, satisfying the first of the above conditions.

To conclude the construction, we need to prove that for k sufficiently large (namely,

for k such that uk(s) > 0 at all states s ∈ S \ (T ∪ W2)), the selector ηk is proper. To this

end we use Theorem 34, and show that for sufficiently large k no end component of Gηkis

entirely contained in S \ (T ∪W2).2 To reason about the end components of Gηk

, for a state

s ∈ S and a player-2 move a2 ∈ Γ2(s), we write

Succk(s, a2) =⋃

a1∈Supp(ηk(s))

Succ(s, a1, a2)

for the set of possible successors of state s when player 1 follows the strategy ηk, and player 2

chooses the move a2.

2In fact, the result holds for all k, even though our proof, for the sake of a simpler argument, does notshow it.


Lemma 41 Let 0 < r ≤ 1 and k ≥ 0, and consider a state s ∈ S \ (T ∪ W2) such that

s ∈ Ukr . For all moves a2 ∈ Γ2(s), we have:

1. either Succk(s, a2) ∩ Uk>r 6= ∅,

2. or Succk(s, a2) ⊆ Ukr , and there is a state t ∈ Succk(s, a2) with ℓk(t) < ℓk(s).

Proof. For convenience, let m = ℓk(s), and consider any move a2 ∈ Γ2(s).

• Consider first the case that Succk(s, a2) 6⊆ Ukr . Then, it cannot be that Succk(s, a2) ⊆

Uk≤r; otherwise, for all states t ∈ Succk(s, a2), we would have uk(t) ≤ r, and there

would be at least one state t ∈ Succk(s, a2) such that uk(t) < r, contradicting uk(s) =

r and Pre1:ηk(uk−1) = uk. So, it must be that Succk(s, a2) ∩ Uk

>r 6= ∅.

• Consider now the case that Succk(s, a2) ⊆ Ukr . Since um ≤ uk, due to the monotonicity

of the Pre1 operator and (6.1), we have that um−1(t) ≤ r for all states t ∈ Succk(s, a2).

From r = uk(s) = um(s) = Pre1:ηk(um−1), it follows that um−1(t) = r for all states

t ∈ Succk(s, a2), implying that ℓk(t) < m for all states t ∈ Succk(s, a2).

The above lemma states that under ηk, from each state i ∈ Ukr with r > 0 we

are guaranteed a probability bounded away from 0 of either moving to a higher-value class

Uk>r, or of moving to states within the value class that have a strictly lower entry time.

Note that the states in the target set T are all in U01 : they have entry-time 0 in the value

class for value 1. This implies that every state in S \ W2 has a probability bounded above

zero of reaching T in at most n = |S| steps, so that the probability of staying forever in

S \ (T ∪ W2) is 0. To prove this fact formally, we analyze the end components of Gηkin

light of Lemma 41.

Lemma 42 For all k ≥ 0, if for all states s ∈ S \ W2 we have uk−1(s) > 0, then for all

player-2 strategies π, we have Prηk,π(Reach(T ∪ W2)) = 1.


Proof. Since every state s ∈ (T ∪ W2) is absorbing, to prove this result, in view of

Corollary 6, it suffices to show that no end component of Gηkis entirely contained in

S \ (T ∪ W2). Towards the contradiction, assume there is such an end component C ⊆

S \ (T ∪ W2). Then, we have C ⊆ Uk[r1,r2]

with C ∩ Ur2 6= ∅, for some 0 < r1 ≤ r2 ≤ 1,

where Uk[r1,r2]

= Uk≥r1

∩ Uk≤r2

is the union of the value classes for all values in the interval

[r1, r2]. Consider a state s ∈ Ukr2

with minimal ℓk, that is, such that ℓk(s) ≤ ℓk(t) for all

other states t ∈ Ukr2

. From Lemma 41, it follows that for every move a2 ∈ Γ2(s), there is

a state t ∈ Succk(s, a2) such that (i) either t ∈ Ukr2

and ℓk(t) < ℓk(s), (ii) or t ∈ Uk>r2

. In

both cases, we obtain a contradiction.

The above lemma shows that ηk satisfies both requirements for optimal selectors

spelt out at the beginning of Section 6.3.2. Hence, ηk guarantees the value uk. This proves

the existence of memoryless ε-optimal strategies for concurrent reachability games.

Theorem 35 (Memoryless ε-optimal strategies) For every ε > 0, memoryless ε-

optimal strategies exist for all concurrent games with reachability objectives.

Proof. Consider a concurrent reachability game with target T ⊆ S. Since limk→∞ uk =

Val1(Reach(T )), for every ε > 0 we can find k ∈ N such that the following two assertions

hold:

maxs∈S

(Val1(Reach(T ))(s) − uk−1(s)

)< ε

mins∈S\W2

uk−1(s) > 0

By construction, Pre1:ηk(uk−1) = Pre1(uk−1) = uk. Hence, from Lemma 40 and Lemma 42,

for all player-2 strategies π, we have Prηk,π(Reach(T )) ≥ uk−1, leading to the result.


6.4 Strategy Improvement

In the previous section, we provided a proof of the existence of memoryless ε-optimal strate-

gies for all ε > 0, on the basis of a value-iteration scheme. In this section we present a

strategy-improvement algorithm for concurrent games with reachability objectives. The

algorithm will produce a sequence of selectors γ0, γ1, γ2, . . . for player 1, such that:

1. for all i ≥ 0, we have Valγi

1 (Reach(T )) ≤ Valγi+1

1 (Reach(T ));

2. limi→∞ Valγi

1 (Reach(T )) = Val1(Reach(T )); and

3. if there is i ≥ 0 such that γi = γi+1, then Valγi

1 (Reach(T )) = Val1(Reach(T )).

Condition 1 guarantees that the algorithm computes a sequence of monotonically improving

selectors. Condition 2 guarantees that the value guaranteed by the selectors converges to

the value of the game, or equivalently, that for all ε > 0, there is a number i of iterations

such that the memoryless player-1 strategy γi is ε-optimal. Condition 3 guarantees that

if a selector cannot be improved, then it is optimal. Note that for concurrent reachability

games, there may be no i ≥ 0 such that γi = γi+1, that is, the algorithm may fail to generate

an optimal selector. This is because there are concurrent reachability games that do not

admit optimal strategies, but only ε-optimal strategies for all ε > 0 [Eve57, dAHK98]. For

turn-based reachability games, it can be easily seen that our algorithm terminates with an

optimal selector.

We note that the value-iteration scheme of the previous section does not di-

rectly yield a strategy-improvement algorithm. In fact, the sequence of player-1 selectors

η0, η1, η2, . . . computed in Section 6.3.1 may violate Condition 3: it is possible that for some

i ≥ 0 we have ηi = ηi+1, but ηi 6= ηj for some j > i. This is because the scheme of

Section 6.3.1 is fundamentally a value-iteration scheme, even though a selector is extracted

from each valuation. The scheme guarantees that the valuations u0, u1, u2, . . . defined as in


(6.1) converge, but it does not guarantee that the selectors η0, η1, η2, . . . improve at each

iteration.

The strategy-improvement algorithm presented here shares an important connec-

tion with the proof of the existence of memoryless ε-optimal strategies presented in the

previous section. Here, also, the key is to ensure that all generated selectors are proper.

Again, this is ensured by modifying the selectors, at each iteration, only where they can be

improved.

6.4.1 The strategy-improvement algorithm

Ordering of strategies. We let W2 be as in Section 6.3.1, and again we assume with-

out loss of generality that all states in W2 ∪ T are absorbing. We define a preorder

≺ on the strategies for player 1 as follows: given two player 1 strategies σ and σ′, let

σ ≺ σ′ if the following two conditions hold: (i) Valσ1 (Reach(T )) ≤ Valσ′

1 (Reach(T )); and

(ii) Valσ1 (Reach(T ))(s) < Valσ′

1 (Reach(T ))(s) for some state s ∈ S. Furthermore, we write

σ σ′ if either σ ≺ σ′ or σ = σ′.

Informal description of Algorithm 6. We now present the strategy-improvement al-

gorithm (Algorithm 6) for computing the values for all states in S \(T ∪W2). The algorithm

iteratively improves player-1 strategies according to the preorder ≺. The algorithm starts

with the random selector γ0 = ξunif

1 . At iteration i+1, the algorithm considers the memory-

less player-1 strategy γi and computes the value v1γi. Observe that since γi is a memoryless

strategy, the computation of Valγi

1 (Reach(T )) involves solving the 2-MDP Gγi. The val-

uation Valγi

1 (Reach(T )) is named vi. For all states s such that Pre1(vi)(s) > vi(s), the

memoryless strategy at s is modified to a selector that is value-optimal for vi. The algo-

rithm then proceeds to the next iteration. If Pre1(vi) = vi, the algorithm stops and returns

the optimal memoryless strategy γi for player 1. Unlike strategy-improvement algorithms


for turn-based games (see [Con93] for a survey), Algorithm 6 is not guaranteed to terminate,

because the value of a reachability game may not be rational.

6.4.2 Convergence

Lemma 43 Let γi and γi+1 be the player-1 selectors obtained at iterations i and i + 1 of

Algorithm 6. If γi is proper, then γi+1 is also proper.

Proof. Assume towards a contradiction that γi is proper and γi+1 is not. Let ξ2 be a

pure selector for player 2 to witness that γi+1 is not proper. Then there exist a subset

C ⊆ S \ (T ∪ W2) such that C is a closed recurrent set of states in the Markov chain

Gγi+1,ξ2 . Let I be the nonempty set of states where the selector is modified to obtain γi+1

from γi; at all other states γi and γi+1 agree.

Since γi and γi+1 agree at all states other than the states in I, and γi is a proper

strategy, it follows that C∩I 6= ∅. Let U ir = s ∈ S\(T∪W2) | Val

γi

1 (Reach(T ))(s) = vi(s) =

r be the value class with value r at iteration i. For a state s ∈ U ir the following assertion

holds: if Succ(s, γi, ξ2) ( U ir, then Succ(s, γi, ξ2) ∩ U i

>r 6= ∅. Let z = maxr | U ir ∩ C 6= ∅,

that is, U iz is the greatest value class at iteration i with a nonempty intersection with the

closed recurrent set C. It easily follows that 0 < z < 1. Consider any state s ∈ I, and

let s ∈ U iq. Since Pre1(vi)(s) > vi(s), it follows that Succ(s, γi+1, ξ2) ∩ U i

>q 6= ∅. Hence

we must have z > q, and therefore I ∩ C ∩ U iz = ∅. Thus, for all states s ∈ U i

z ∩ C,

we have γi(s) = γi+1(s). Recall that z is the greatest value class at iteration i with a

nonempty intersection with C; hence U i>z ∩C = ∅. Thus for all states s ∈ C ∩ U i

z, we have

Succ(s, γi+1, ξ2) ⊆ U iz ∩ C. It follows that C ⊆ U i

z. However, this gives us three statements

that together form a contradiction: C ∩ I 6= ∅ (or else γi would not have been proper),

I ∩ C ∩ U iz = ∅, and C ⊆ U i

z.

Lemma 44 For all i ≥ 0, the player-1 selector γi obtained at iteration i of Algorithm 6 is


proper.

Proof. By Lemma 39 we have that γ0 is proper. The result then follows from Lemma 43

and induction.

Lemma 45 Let γi and γi+1 be the player-1 selectors obtained at iterations i and i + 1

of Algorithm 6. Let I = s ∈ S | Pre1(vi)(s) > vi(s). Let vi = Valγi

1 (Reach(T )) and

vi+1 = Valγi+1

1 (Reach(T )). Then vi+1(s) ≥ Pre1(vi)(s) for all states s ∈ S; and therefore

vi+1(s) ≥ vi(s) for all states s ∈ S, and vi+1(s) > vi(s) for all states s ∈ I.

Proof. Consider the valuations vi and vi+1 obtained at iterations i and i + 1, respectively,

and let wi be the valuation defined by wi(s) = 1 − vi(s) for all states s ∈ S. Since γi+1 is

proper (by Lemma 44), it follows that the counter-optimal strategy for player 2 to minimize

vi+1 is obtained by maximizing the probability to reach W2. In fact, there are no end

components in S \ (W2 ∪ T ) in the 2-MDP Gγi+1 . Let

wi+1(s) =

wi(s) if s ∈ S \ I;

1 − Pre1(vi)(s) < wi(s) if s ∈ I.

In other words, wi+1 = 1 − Pre1(vi), and we also have wi+1 ≤ wi. We now show that

wi+1 is a feasible solution to the linear program for MDPs with the objective Reach(W2), as

described in subsection 6.2. Since vi = Valγi

1 (Reach(T )), it follows that for all states s ∈ S

and all moves a2 ∈ Γ2(s), we have

wi(s) ≥∑

t∈S

wi(t) · δγi(s, a2).

For all states s ∈ S \ I, we have γi(s) = γi+1(s) and wi+1(s) = wi(s), and since wi+1 ≤ wi,

it follows that for all states s ∈ S \ I and all moves a2 ∈ Γ2(s), we have

wi+1(s) ≥∑

t∈S

wi+1(t) · δγi+1(s, a2).


Since for s ∈ I the selector γi+1(s) is obtained as an optimal selector for Pre1(vi)(s), it

follows that for all states s ∈ I and all moves a2 ∈ Γ2(s), we have

wi+1(s) ≥∑

t∈S

wi(t) · δγi+1(s, a2).

Since wi+1 ≤ wi, for all states s ∈ I and all moves a2 ∈ Γ2(s), we have

wi+1(s) ≥∑

t∈S

wi+1(t) · δγi+1(s, a2).

Hence it follows that wi+1 is a feasible solution to the linear program for MDPs with

reachability objectives. Since the reachability valuation for player 2 for Reach(W2) is the

least solution (observe that the objective function of the linear program is a minimizing

function), it follows that vi+1 ≥ 1 − wi+1 = Pre1(vi). Thus we obtain vi+1(s) ≥ vi(s) for

all states s ∈ S, and vi+1(s) > vi(s) for all states s ∈ I.

Theorem 36 (Strategy improvement) The following two assertions hold about Algo-

rithm 6:

1. For all i ≥ 0, we have γi γi+1; moreover, if γi = γi+1, then γi is an optimal

strategy.

2. limi→∞ vi = limi→∞ Valγi

1 (Reach(T )) = Val1(Reach(T )).

Proof. We prove the two parts as follows.

1. The assertion that γi γi+1 follows from Lemma 45. If γi = γi+1, then Pre1(vi) = vi,

indicating that vi = Val1(Reach(T )). From Lemma 44 it follows that γi is proper.

Since γi is proper by Lemma 40, we have Valγi

1 (Reach(T )) ≥ vi = Val1(Reach(T )).

It follows that γi is optimal for player 1.

2. Let v0 = [T ] and u0 = [T ]. We have u0 ≤ v0. For all k ≥ 0, by Lemma 45, we have

vk+1 ≥ [T ] ∨ Pre1(vk). For all k ≥ 0, let uk+1 = [T ] ∨ Pre1(uk). By induction we


conclude that for all k ≥ 0, we have uk ≤ vk. Moreover, vk ≤ Val1(Reach(T )), that

is, for all k ≥ 0, we have

uk ≤ vk ≤ Val1(Reach(T )).

Since limk→∞ uk = Val1(Reach(T )), it follows that

limk→∞

Valγk

1 (Reach(T )) = limk→∞

vk = Val1(Reach(T )).

The theorem follows.

6.5 Conclusion

In this chapter we presented an elementary and combinatorial proof of existence

of ε-optimal strategies in concurrent games with reachability objectives, for all ε > 0. We

also presented a strategy improvement algorithm.


Algorithm 6 Strategy-Improvement Algorithm

Input: a concurrent game structure G with target set T .

0. Compute W2 = s ∈ S | Val1(Reach(T ))(s) = 0.

1. Let γ0 = ξunif1 and i = 0.

2. Compute v0 = Valγ01 (Reach(T )).

3. do

3.1. Let I = s ∈ S \ (T ∪ W2) | Pre1(vi)(s) > vi(s).

3.2. Let ξ1 be a player-1 selector such that for all states s ∈ I,

we have Pre1:ξ1(vi)(s) = Pre1(vi)(s) > vi(s).

3.3. The player-1 selector γi+1 is defined as follows: for each state t ∈ S, let

γi+1(t) =

γi(t) if s 6∈ I;

ξ1(s) if s ∈ I.

3.4. Compute vi+1 = Valγi+1

1 (Reach(T )).

3.5. Let i = i + 1.

until I = ∅.

156

Chapter 7

Concurrent Limit-average Games

In this chapter we will consider concurrent games with limit-average objectives.1

The main result of this chapter is as follows: the value of a concurrent zero-sum game with

limit-average payoff can be approximated to within ε in time exponential in a polynomial

in the size of the game times polynomial in logarithmic in 1ε , for all ε > 0. Our main

technique is the characterization of values as semi-algebraic quantities [BK76, MN81]. We

show that for a real number α, whether the value of a concurrent limit-average game at a

state s is strictly greater than α can be expressed as a sentence in the theory of real-closed

fields. Moreover, this sentence is polynomial in the size of the game and has a constant

number of quantifier alternations. The theory of real-closed fields is decidable in time

exponential in the size of a formula and doubly exponential in the quantifier alternation

depth [Bas99]. This, together with binary search over the range of values, gives an algorithm

exponential in polynomial in the size of the game graph times polynomial in logarithmic in

1ε to approximate the value, for ε > 0. Our techniques combine several known results to

provide the first complexity bound on the general problem of approximating the value of

stochastic games with limit-average objectives.

1Preliminary versions of the results of this chapter appeared in [CMH07]

CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES 157

7.1 Definitions

We start with few basic definitions.

Concurrent limit-average games. We consider zero-sum concurrent limit-average games

which consists of a concurrent game structure G = (S,A,Γ1,Γ2, δ) and a real-valued reward

function r : S → R, that maps every state to a real valued reward.

Size of a concurrent game. We now present a few notations that we will require to

precisely characterize the complexity of concurrent limit-average games. Given a concurrent

game G we use the following notations:

1. n = |S| is the number of states;

2. |δ| =∑

s∈S |Γ1(s)| · |Γ2(s)| is the number of entries of the transition function.

Given a rational concurrent game (where all rewards and transition probabilities are ratio-

nal) we use the following notations:

1. size(δ) =∑

t∈S

∑a∈Γ1(s)

∑b∈Γ2(s) |δ(s, a, b)(t)|, where |δ(s, a, b)(t)| denotes the space

to express δ(s, a, b)(t) in binary;

2. size(r) =∑

s∈S |r(s)|, where |r(s)| denotes the space to express r(s) in binary;

3. |G| = size(G) = size(δ) + size(r).

The specification of a game G requires O(|G|) bits. Given a stochastic game with n states,

we assume without loss of generality that the state space of the stochastic game structure

is enumerated as natural numbers, S = 1, 2, . . . , n, i.e., the states are numbered from 1

to n.

Limit-average objectives. For a reward function r and n ∈ N consider the function

Avg(r, n) : Ω → R defined as follows: for a play ω = 〈s0, , s1, s2, . . .〉 we have

Avg(r,N)(ω) =1

n

N−1∑

i=0

r(si);


i.e., it is the average of the first n rewards of the play. For a reward function r we consider

two functions LimAvgInf(r) : Ω → R and LimAvgSup(r) : Ω → R defined as follows: for a

play ω = 〈s0, s1, s2, . . .〉 we have

LimAvgInf(r)(ω) = lim infN→∞

Avg(r,N); LimAvgSup(r)(ω) = lim supN→∞

Avg(r,N).

In other words, the above functions specify the “long-run” average of the rewards of the

play. Also note that for all plays ω we have LimAvgInf(ω) ≤ LimAvgSup(ω).

Valuations and values. A valuation is a mapping v : S → R, associating a real number

v(s) with each state s. Given a state s ∈ S and we are interested in finding the maximal

payoff that player 1 can ensure against all strategies for player 2, and the maximal payoff

that player 2 can ensure against all strategies for player 1. We call such payoff the value of

the game G at s for player i ∈ 1, 2. The value for player 1 and player 2 are given by the

valuations v1 : S → R and v2 : S → R, defined for all s ∈ S by

Val1(LimAvg(r))(s) = supσ∈Σ

infπ∈Π

Eσ,πs [LimAvgInf(r)];

Val2(LimAvg(r))(s) = supπ∈Π

infσ∈Σ

Eσ,πs [LimAvgSup(−r)].

Mertens and Neyman [MN81] established the determinacy of concurrent limit-average

games.

Theorem 37 ([MN81]) For all concurrent limit-average games, for all states s, we have

Val1(LimAvg(r))(s) + Val2(LimAvg(r))(s) = 0.

Stronger notion of existence of values [MN81]. The value for concurrent games exists

in a strong sense [MN81]: ∀ε > 0, ∃σ∗ ∈ Σ,∃π∗ ∈ Π such that the following conditions hold:

1. for all σ and π we have

−ε + Eσ,π∗

s [LimAvgSup(r)] ≤ Eσ∗,πs [LimAvgInf(r)] + ε; (7.1)


2. for all ε1 > 0, there exists N0 = such that for all σ and π, for all N ≥ N0 we have

−ε1 + Eσ,π∗

s [Avg(r,N)] ≤ Eσ∗,πs [Avg(r,N)] + ε1. (7.2)

The condition (7.1) is equivalent to the following equality

supσ∈Σ

infπ∈Π

Eσ,πs [LimAvgInf(r)] = inf

π∈Πsupσ∈Σ

Eσ,πs [LimAvgSup(r)].

7.2 Theory of Real-closed Fields and Quantifier Elimination

Our main technique is to represent the value of a game as a formula in the theory

of real-closed fields. We denote by R the real-closed field (R,+, ·, 0, 1,≤) of the reals with

addition and multiplication. In the sequel we write “real-closed field” to denote the real-

closed field R. An atomic formula is an expression of the form p < 0 or p = 0, where p is

a (possibly) multi-variate polynomial with coefficients in the real-closed field. Coefficients

are rationals or symbolic constants (e.g., the symbolic constant e stands for 2.71828 . . .).

We will consider the special case when only rational coefficients of the form q1

q2, where q1, q2

are integers, are allowed. A formula is constructed from atomic formulas by the grammar

ϕ ::= a | ¬a | ϕ ∧ ϕ | ϕ ∨ ϕ | ∃x.ϕ | ∀x.ϕ,

where a is an atomic formula, ¬ denotes complementation, ∧ denotes conjunction, ∨ denotes

disjunction, and ∃ and ∀ denote existential and universal quantification, respectively. We

use the standard abbreviations such as p ≤ 0, p ≥ 0 and p > 0 that are derived as follows:

p ≤ 0 (for p < 0 ∨ p = 0), p ≥ 0 (for ¬(p < 0)), and p > 0 (for ¬(p ≤ 0)).

The semantics of formulas are given in a standard way. A variable x is free in the formula

ϕ if it is not in the scope of a quantifier ∃x or ∀x. A sentence is a formula with no free


variables. A formula is quantifier-free if it does not contain any existential or universal

quantifier. Two formulas ϕ1 and ϕ2 are equivalent if the set of free variables of ϕ1 and

ϕ2 are the same, and for every assignment to the free variables the formula ϕ1 is true if

and only if the formula ϕ2 is true. A formula ϕ admits quantifier elimination if there is an

algorithm to convert it to an equivalent quantifier-free formula. A quantifier elimination

algorithm takes as input a formula ϕ and returns an equivalent quantifier-free formula, if

one exists.

Tarski [Tar51] proved that every formula in the theory of the real-closed field

admits quantifier elimination, and (by way of quantifier elimination) that there is an al-

gorithm to decide the truth of a sentence ϕ in the theory of the real-closed field. The

complexity of the algorithm of Tarski has subsequently improved, and we now present a

result of Basu [Bas99] on the complexity of quantifier elimination for formulas in the theory

of the real-closed field.

Complexity of quantifier elimination. We first define the length of a formula ϕ, and

then define the size of a formula with rational coefficients. We denote the length and size of

ϕ as len(ϕ) and size(ϕ), respectively. The length of a polynomial p is defined as the sum of

the length of its constituent monomials plus the number of monomials in the polynomial.

The length of a monomial is defined as its degree plus the number of variables plus 1 (for

the coefficient). For example, for the monomial 14 · x3 · y2 · z, its length is 6 + 3 + 1 = 10.

Given a polynomial p, the length of both p < 0 and p = 0 is len(p) + 2. This defines the


length of an atomic formula a. The length of a formula ϕ is inductively defined as follows:

len(¬a) = len(a) + 1;

len(ϕ1 ∧ ϕ2) = len(ϕ1) + len(ϕ2) + 1;

len(ϕ1 ∨ ϕ2) = len(ϕ1) + len(ϕ2) + 1;

len(∃x.ϕ) = len(ϕ) + 2;

len(∀x.ϕ) = len(ϕ) + 2.

Observe that the length of a formula is defined for formulas that may contain symbolic

constants as coefficients. For formulas with rational coefficients we define its size as follows:

the size of ϕ, i.e., size(ϕ), is defined as the sum of len(ϕ) and the space required to specify

the rational coefficients of the polynomials appearing in ϕ in binary. We state a result

of Basu [Bas99] on the complexity of quantifier elimination for the real-closed field. The

following theorem is a specialization of Theorem 1 of [Bas99]; also see Theorem 14.14 and

Theorem 14.16 of [BPMF].

Theorem 38 [Bas99] Let d, k,m be nonnegative integers, X = X1,X2, . . . ,Xk be a

set of k variables, and P = p1, p2, . . . , pm be a set of m polynomials over the set X

of variables, each of degree at most d and with coefficients in the real-closed field. Let

X[r],X[r−1], . . . ,X[1] denote a partition of the set X of variables into r subsets such that the

set X[i] of variables has size ki, i.e., ki = |X[i]| and∑r

i=1 ki = k. Let

Φ = (QrX[r]). (Qr−1X[r−1]). · · · .(Q2X[2]). (Q1X[1]). ϕ(p1, p2, . . . , pm)

be a sentence with r alternating quantifiers Qi ∈ ∃,∀ (i.e., Qi+1 6= Qi), and

ϕ(p1, p2, . . . , pm) is a quantifier-free formula with atomic formulas of the form pi ⊲⊳ 0,

where ⊲⊳ ∈ <,>,=. Let D denote the ring generated by the coefficients of the polynomials

in P. Then the following assertions hold.

1. There is an algorithm to decide the truth of Φ using

mQ

i(ki+1) · dQ

i O(ki) · len(ϕ)


arithmetic operations (multiplication, addition, and sign determination) in D.

2. If D = Z (the set of integers) and the bit sizes of the coefficients of the polynomials

are bounded by γ, then the bit sizes of the integers appearing in the intermediate

computations of the truth of Φ is bounded by

γ · dQ

i O(ki).

The result of part 1 of Theorem 38 holds for sentences with symbolic constants

as coefficients. The result of part 2 of Theorem 38 is for the special case of sentences

with only integer coefficients. Part 2 of Theorem 38 follows from the results of [Bas99],

but is not explicitly stated as a theorem there; for an explicit statement as a theorem, see

Theorem 14.14 and Theorem 14.16 of [BPMF].

Remark 1 Given two integers a and b, let |a| and |b| denote the space to express a and b

in binary, respectively. The following assertions hold: given integers a and b,

1. given signs of a and b, the sign determination of a+b can be done in O(|a|+ |b|) time,

i.e., in linear time, and the sign determination of a · b can be done O(1) time, i.e., in

constant time;

2. addition of a and b can be done in O(|a| + |b|) time, i.e., in linear time; and

3. multiplication of a and b can be done in O(|a| · |b|) time, i.e., in quadratic time.

It follows from the above observations, along with Theorem 38, that if D = Z and the bit

sizes of the coefficients of the polynomials appearing in Φ are bounded by γ, then the truth

of Φ can be determined in time

mQ

i O(ki+1) · dQ

i O(ki) · O(len(ϕ) · γ2). (7.3)


7.3 Computation of Values in Concurrent Limit-average

Games

The values in concurrent limit-average games can be irrational even if all rewards

and transition probability values are rational [RF91]. Hence, we can algorithmically only

approximate the values to within a precision ε, for ε > 0.

Discounted value functions. Let G be a concurrent limit-average game with reward

function r. For a real β, with 0 < β < 1, the β-discounted value function Valβ1 is defined as

follows:

Valβ1 (s) = supσ∈Σ

infπ∈Π

β · Eσ,πs

[ ∞∑

i=1

(1 − β)i · r(Xi)].

For a concurrent limit-average game G, the β-discounted value function Valβ1 is monotonic

with respect to β in a neighborhood of 0 [MN81].

7.3.1 Sentence for the value of a concurrent limit-average game

We now describe how we can obtain a sentence in the theory of the real-closed

field that states that the value of a concurrent limit-average game at a given state is strictly

greater than α, for a real α. The sentence applies to the case where the rewards and the

transition probabilities are specified as symbolic or rational constants.

Formula for β-discounted value functions. Given a real α and a concurrent limit-

average game G, we present a formula in the theory of the real-closed field to express that

the β-discounted value Valβ1 (s) at a given state s is strictly greater than α, for 0 < β < 1. A

valuation v ∈ Rn is a vector of reals, and for 1 ≤ i ≤ n, the i-th component of v represents

the value v(i) for state i. For every state s ∈ S and for every move b ∈ Γ2(s) we define a

polynomial u(s,b,1) for player 1 as a function of x ∈ Dist(Γ1(s)), a valuation v and 0 < β < 1


as follows:

u(s,b,1)(x, v, β) = β ·∑

a∈Γ1(s)

x(a) · r(s) + (1 − β) ·∑

a∈Γ1(s)

x(a) ·∑

t∈S

δ(s, a, b)(t) · v(t) − v(s).

The polynomial u(s,b,1) consists of the variables β, and x(a) for a ∈ Γ1(s), and v(t) for

t ∈ S. Observe that given a concurrent limit-average game, r(s) and δ(s, a, b)(t) for t ∈ S

and a ∈ Γ1(s) are rational or symbolic constants given by the game graph, not variables.

The coefficients of the polynomial are r(s) and δ(s, a, b)(t) for a ∈ Γ1(s) and t ∈ S. Hence

the polynomial has degree 3 and has 1+ |Γ1(s)|+n variables. Similarly, for s ∈ S, a ∈ Γ1(s),

y ∈ Dist(Γ2(s)), v ∈ Rn, and 0 < β < 1, we have polynomials u(s,a,2) defined by

u(s,a,2)(y, v, β) = β ·∑

b∈Γ2(s)

y(b) · r(s) + (1 − β) ·∑

b∈Γ2(s)

y(b) ·∑

t∈S

δ(s, a, b)(t) · v(t) − v(s).

The sentence stating that Valβ1 (s) is strictly greater than α is as follows. We have

variables xs(a) for s ∈ S and a ∈ Γ1(s), ys(b) for s ∈ S and b ∈ Γ2(s), and

variables v(1), v(2), . . . , v(n). For simplicity we write xs for the vector of variables

xs(a1), xs(a2), . . . , xs(aj), where Γ1(s) = a1, a2, . . . , aj, ys for the vector of variables

ys(b1), ys(b2), . . . , ys(bl), where Γ2(s) = b1, b2, . . . , bl, and v for the vector of variables

v(1), v(2), . . . , v(n). The sentence is as follows:

Φβ(s, α) = ∃x1, . . . , xn. ∃y1, . . . , yn. ∃v. Ψ(x1, x2, . . . , xn, y1, y2, . . . , yn)

∧ ∧

s∈S,b∈Γ2(s)

(u(s,b,1)(xs, v, β) ≥ 0

) ∧ ∧

s∈S,a∈Γ1(s)

(u(s,a,2)(ys, v, β) ≤ 0

)

∧v(s) − α > 0;

where Ψ(x1, x2, . . . , xn, y1, y2, . . . , yn) specify the constraints that x1, x2, . . . , xn and

y1, y2, . . . , yn are valid randomized strategies and is defined as follows:

Ψ(x1, x2, . . . , xn, y1, y2, . . . , yn) =∧

s∈S

((

∑

a∈Γ1(s)

xs(a)) − 1 = 0)

∧∧

s∈S,a∈Γ1(s)

(xs(a) ≥ 0

)

∧∧

s∈S

((

∑

b∈Γ2(s)

ys(b)) − 1 = 0)

∧∧

s∈S,b∈Γ2(s)

(ys(b) ≥ 0

).


The total number of polynomials in Φβ(s, α) is 1+∑

s∈S(3 · |Γ1(s)|+3 · |Γ2(s)|+2) = O(|δ|).

In the above formula we treat β as a variable; it is a free variable in Φβ(s, α). Given a

concurrent limit-average game G, for all 0 < β < 1, the correctness of Φβ(s, α) to specify

that Valβ1 (s) > α can be proved from the results of [Sha53]. Also observe that we have a

formula in the existential theory of real-closed field (the sub-class of the real-closed field

where only the existential quantifier is used) that states Valβ1 (s) > α. Since the existential

theory of is decidable in PSPACE [Can88], we have the following result.

Theorem 39 Given a rational concurrent limit-average game G, a state s of G, a discount

factor β and a rational α, whether Valβ1 (s) > α can be decided in PSPACE.

Value of a game as limit of discounted games. The result of Mertens-Neyman [MN81]

established that the value of a concurrent limit-average game is the limit of the β-discounted

values, as β goes to 0. Formally, we have

Val1(LimAvg(r))(s) = limβ→0+

Valβ1 (s).

Sentence for the value of a concurrent limit-average game. From the characteriza-

tion of the value of a concurrent limit-average game as the limit of the β-discounted values

and the monotonicity property of the β-discounted values in a neighborhood of 0, we obtain

the following sentence Φ(s, α) stating that the value at state s is strictly greater than α.

In addition to variables for Φβ(s, α), we have the variables β and β1. The sentence Φ(s, α)

specifies the expression

∃β1 > 0. ∀β ∈ (0, β1). Φβ(s, α),


and is defined as follows:

Φ(s, α) = ∃β1. ∀β. ∃x1, . . . , xn. ∃y1, . . . , yn. ∃v. Ψ(x1, x2, . . . , xn, y1, y2, . . . , yn)

∧β1 > 0

∧ [β1 − β ≤ 0

∨β ≤ 0

∨ (β1 − β > 0

∧ ∧

s∈S,b∈Γ2(s)

(u(s,b,1)(xs, v, β) ≥ 0

)

∧ ∧

s∈S,a∈Γ1(s)

(u(s,a,2)(ys, v, β) ≤ 0

))]

∧v(s) − α > 0;

where Ψ(x1, x2, . . . , xn, y1, y2, . . . , yn) specify the constraints that x1, x2, . . . , xn and

y1, y2, . . . , yn are valid randomized strategies (the same formula used for Φβ(s, α)). Ob-

serve that Φ(s, α) contains no free variable (i.e., the variables xs, ys, v, β1, and β are

quantified). A similar sentence was used in [BK76] for values of discounted games. The

total number of polynomials in Φ(s, α) is O(|δ|); in addition to the O(|δ|) polynomials of

Φβ(s, α) there are 4 more polynomials in Φ(s, α). In the setting of Theorem 38 we obtain

the following bounds for Φ(s, α):

m = O(|δ|); k = O(|δ|);∏

i

(ki + 1) = O(|δ|); r = O(1); d = 3; (7.4)

and hence we have

mQ

i(ki+1) · dQ

i O(ki) = O(|δ|)O(|δ|) = 2O(|δ|·log(|δ|)

).

Also observe that for a concurrent game G, the sum of the lengths of the polynomials

appearing in the sentence is O(|δ|). The present analysis along with Theorem 38 yields

Theorem 40. The result of Theorem 40 holds for concurrent limit-average games where the

transition probabilities and rewards are specified as symbolic constants.

Theorem 40 Given a concurrent limit-average game G with reward function r, a state s

of G, and a real α, there is an algorithm to decide whether Val1(LimAvg(r))(s) > α using


2O(|δ|·log(|δ|)

)·O(|δ|) arithmetic operations (addition, multiplication, and sign determination)

in the ring generated by the set

r(s) | s ∈ S ∪ δ(s, a, b)(t) | s, t ∈ S, a ∈ Γ1(s), b ∈ Γ2(s) ∪ α.

7.3.2 Algorithmic analysis

For algorithmic analysis we consider rational concurrent games, i.e., concurrent

games such that r(s) and δ(s, a, b)(t) are rational for all states s, t ∈ S, and moves a ∈ Γ1(s)

and b ∈ Γ2(s). In the sequel we will only consider rational concurrent games. Given the

sentence Φ(s, α) to specify that Val1(LimAvg(r))(s) > α, we first reduce it to an equivalent

sentence Φ(s, α) as follows.

• For every rational coefficient ℓ = q1

q2, where q1, q2 ∈ Z, appearing in Φ(s, α) we apply

the following procedure:

1. introduce a new variable zℓ;

2. replace ℓ by zℓ in Φ(s, α);

3. add a polynomial q2 · zℓ − q1 = 0 as a conjunct to the quantifier-free body of the

formula; and

4. existentially quantify zℓ in the block of existential quantifiers after quantifying

β1 and β.

Thus we add O(|δ|) variables and polynomials, and increase the degree of the polynomials

in Φ(s, α) by 1. Also observe that the coefficients in Φ(s, α) are integers, and hence the ring

D generated by the coefficients in Φ(s, α) is Z. Similar to the bounds obtained in (7.4), in

the setting of Theorem 38 we obtain the following bounds for Φ(s, α):

m = O(|δ|); k = O(|δ|);∏

i

(ki + 1) = O(|δ|); r = O(1); d = 4;


and hence

mQ

i O(bki+1) · dQ

i O(bki) = O(|δ|)O(|δ|) = 2O(|δ|·log(|δ|)

).

Also observe that the length of the sentence Φ(s, α) can be bounded by O(|δ|), and the sum

of the bit sizes of the coefficients in Φ(s, α) can be bounded by O(|G| + |α|), where |α| is

the space required to express α in binary. This along with (7.3) of Remark 1 yields the

following result.

Theorem 41 Given a rational concurrent limit-average game G, a state s of G, and a

rational α, there is an algorithm that decides whether Val1(LimAvg(r))(s) > α in time

2O(|δ|·log(|δ|)

)· O(|δ|) · O(|G|2 + |α|2) = 2O

(|δ|·log(|δ|)

)· O(|G|2 + |α|2).

7.3.3 Approximating the value of a concurrent limit-average game

We now present an algorithm that approximates the value Val1(LimAvg(r))(s)

within a tolerance of ε > 0. The algorithm (Algorithm 7) is obtained by a binary search

technique along with the result of Theorem 41. Algorithm 7 works for the special case

of normalized rational concurrent games. We first define normalized rational concurrent

games and then present a reduction of rational concurrent games to normalized rational

concurrent games.

Normalized rational concurrent games. A rational concurrent game is normalized if

the reward function satisfies the following two conditions: (1) minr(s) | s ∈ S ≥ 0; and

(2) maxr(s) | s ∈ S ≤ 1.

Reduction. We now present a reduction of rational concurrent games to normalized ra-

tional concurrent games, such that by approximating the values of normalized rational

concurrent games we can approximate the values of rational concurrent games. Given a

reward function r : S → R, let

M = maxabs(r(s)) | s ∈ S,


where abs(r(s)) denotes the absolute value of r(s). Without loss of generality we assume

M > 0. Otherwise, r(s) = 0 for all states s ∈ S, and hence Val1(LimAvg(r))(s) = 0 for

all states s ∈ S (i.e., the value function can be trivially computed). Consider the reward

function r+ : S → [0, 1] defined as follows: for s ∈ S we have

r+(s) =r(s) + M

2M.

The reward function r+ is normalized and the following assertion hold. Let Val1(LimAvg(r))

and Val1(LimAvg(r+)) denote the value functions for the reward functions r and r+, re-

spectively. Then for all states s ∈ S we have

Val1(LimAvg(r+))(s) =Val1(LimAvg(r))(s) + M

2M.

Hence it follows that for rationals α, l, and u, such that l ≤ u, we have

Val1(LimAvg(r))(s) > α iff Val1(LimAvg(r+))(s) >α + M

2M;

Val1(LimAvg(r+))(s) ∈ [l, u] iff Val1(LimAvg(r))(s) ∈ [M · (2l − 1),M · (2u − 1)].

Given a rational ε > 0, to obtain an interval [l1, u1] such that u1 − l1 ≤ ε and

Val1(LimAvg(r))(s) ∈ [l1, u1], we first obtain an interval [l, u] such that u − l ≤ ε2M

and Val1(LimAvg(r+))(s) ∈ [l, u]. From the interval [l, u] we obtain the interval [l1, u1] =

[M ·(2l−1),M ·(2u−1)] such that Val1(LimAvg(r))(s) ∈ [l1, u1] and u1−l1 = 2·M ·(u−l) ≤ ε.

Hence we present the algorithm to approximate the values for normalized rational concur-

rent games.

Running time of Algorithm 7. In Algorithm 7 we denote by Φ(s,m) the sentence to

specify that Val1(LimAvg(r))(s) > m, and by Theorem 41 the truth of Φ(s,m) can be

decided in time

2O(|δ|·log(|δ|)

)· O(|G|2 + |m|2),

for a concurrent game G, where |m| is the number of bits required to specify m. In Al-

gorithm 7, the variables l and u are initially set to 0 and 1, respectively. Since the game


Algorithm 7 Approximating the value of a concurrent limit-average game

Input: a normalized rational concurrent limit-average game G,

a state s of G, and a rational value ε > 0 specifying the desired tolerance.

Output: a rational interval [l, u] such that u − l ≤ 2ε and Val1(LimAvg(r))(s) ∈ [l, u].

1. l := 0; u := 1; m = 12 ;

2. repeat for ⌈log(

1ε

)⌉ steps

2.1. if Φ(s,m), then

2.1.1. l := m; u := u; m := l+u2 ;

2.2. else

2.2.1. l := l; u := m; m := l+u2 ;

3. return [l, u];

is normalized, the initial values of l and u clearly provide lower and upper bounds on the

value, and provide starting bounds for the binary search. In each iteration of the algorithm,

in Steps 2.1.1 and 2.2.1, there is a division by 2. It follows that after i iterations l, u, and

m can be expressed as q2i , where q is an integer and q ≤ 2i. Hence l, u, and m can always

be expressed in

O(log

(1

ε

))

bits. The loop in Step 4 runs for ⌈log(

1ε

)⌉ = O

(log

(1ε

))iterations, and every iteration can

be computed in time 2O(|δ|·log(|δ|)

)· O

(|G|2 + log2

(1ε

)). This gives the following theorem.

Theorem 42 Given a normalized rational concurrent limit-average game G, a state s of G,

and a rational ε > 0, Algorithm 7 computes an interval [l, u] such that Val1(LimAvg(r))(s) ∈


[l, u] and u − l ≤ 2ε, in time

2O(|δ|·log(|δ|)

)· O

(|G|2 · log

(1

ε

)+ log3

(1

ε

)).

The reduction from rational concurrent games to normalized concurrent games

suggest that for a rational concurrent game G and a rational tolerance ε > 0, to obtain an

interval of length at most ε that contains the value Val1(LimAvg(r))(s), it suffices to obtain

an interval of length of at most ε2M that contains the value in the corresponding normalized

game, where M = maxabs(r(s)) | s ∈ S. Since M can be expressed in |G| bits, it follows

that the size of the normalized game is O(|G|2). Given a tolerance ε > 0 for the rational

concurrent game, we need to consider the tolerance ε2·M for the normalized game. The above

analysis along with Theorem 42 yields the following corollary (the corollary is obtained from

Theorem 42 by substituting |G| by |G|2, and log(

1ε

)by |G| · log

(1ε

)).

Corollary 7 Given a rational concurrent limit-average game G, a state s of G, and a

rational ε > 0, an interval [l, u] such that Val1(LimAvg(r))(s) ∈ [l, u] and u − l ≤ 2ε, can

be computed in time

2O(|δ|·log(|δ|)

)· O

(|G|5 · log

(1

ε

)+ |G|3 · log3

(1

ε

)).

Hence from Theorem 41 and Corollary 7 we obtain the following result.

Theorem 43 Given a rational concurrent limit-average game G, a state s of G, rational

ε > 0, and rational α, the following assertions hold.

1. (Decision problem) Whether Val1(LimAvg(r))(s) > α can be decided in EXPTIME.

2. (Approximation problem) An interval [l, u] such that u − l ≤ 2ε and

Val1(LimAvg(r))(s) ∈ [l, u] can be computed in EXPTIME.


7.4 Conclusion

We showed that concurrent limit-average games can be solved in EXPTIME. Un-

fortunately, the only lower bound on the complexity is PTIME-hardness (polytime hard-

ness). The hardness follows from a reduction from alternating reachability. However, from

the results of [EY06] it follows that the square root-sum problem (that is not known to be

in NP) can be reduced to the decision problem of concurrent limit-average games. Even

for the simpler case of turn-based deterministic limit-average games no polynomial time

algorithm is known [ZP96], and the best known algorithm for turn-based stochastic limit-

average games is exponential in the size of the game. In case of turn-based stochastic games,

pure memoryless optimal strategies exist [LL69] and the complexity of turn-based stochastic

limit-average games is NP ∩ coNP. Since the number of pure memoryless strategies can be

at most exponential in the size of the game, there is an exponential time algorithm to com-

pute the values exactly (not approximation) for turn-based stochastic limit-average games

(also see the survey [NS03]). The main open problems are as follows.

1. Whether a PSPACE algorithm can be obtained for the decision problem or approxi-

mation problem for concurrent limit-average games remains open.

2. Whether a polynomial time algorithm can be obtained for turn-based stochastic limit-

average games and turn-based deterministic limit-average games remains open.

173

Chapter 8

Concurrent Parity Games

In this chapter we consider concurrent zero-sum games with parity objectives.1

Concurrent games are substantially more complex than turn-based games in several re-

spects. To see this, consider the structure of optimal strategies. For turn-based stochastic

parity games pure memoryless optimal strategies exist. It is this observation that led to

the NP ∩ coNP results for turn-based parity games. By contrast, in concurrent games,

already for reachability objectives, players must in general play with randomized strategies.

Furthermore, optimal strategies may not exist: rather, for every real ε > 0, the players have

ε-optimal strategies. Even for relatively simple parity winning conditions, such as Buchi

conditions, ε-optimal strategies need both randomization and infinite memory [dAM01].

It is therefore not inconceivable that the complexity of concurrent parity games might be

considerably worse. The only known previous algorithm for computing the value of con-

current parity games is triple-exponential [dAM01]: it is shown in [dAM01] that the value

of concurrent parity games can be characterized by fixpoint of expressions written in the

quantitative µ-calculus, a quantitative extension of the ordinary µ-calculus [Koz83]. The

triple-exponential algorithm was obtained via a reduction of the quantitative µ-calculus

1Preliminary version of the results of this chapter appeared in [CdAH06a]

CHAPTER 8. CONCURRENT PARITY GAMES 174

formula to the theory of the real closed field, and then using decision procedures for the

theory of reals with addition and multiplication [Tar51, Bas99]. This approach fails to pro-

vide concise witness for ε-optimal strategies. In [dAH00] it is shown that, given a parity

game, the problem of deciding whether the value at a state is 1 is in NP ∩ coNP, and there

exists concise witness for ε-optimal strategies, for ε > 0, for states with optimal value 1.

In this chapter, we present concise witness for ε-optimal strategies, for ε > 0, for

concurrent games with parity objectives. We then show that the values can be computed

to any desired precision ε > 0 in PSPACE. Also given a rational α, it can be decided in

PSPACE if the value at a state is greater than α. The basic idea behind the proof, which

can no longer rely on the existence of pure memoryless optimal strategies, is as follows.

Through a detailed analysis of the branching structure of the stochastic process of the game,

we show that we can construct an ε-optimal strategy by stitching together strategies, one

per each value class. In each value class the witness is obtained as the witness constructed

in [dAH00], that satisfy certain local conditions. This gives us a witness for an ε-optimal

strategy. The decision procedure guesses and verifies the qualitative witnesses of [dAH00]

in each value class and the local optimality is checked by a formula in the existential theory

of the real-closed field. This gives us a NPSPACE algorithm. A detailed analysis of our

proof also gives us the following result. We show that in concurrent parity games, there

exists a sequence of ε-optimal strategies, such that the limit of the ε-optimal strategies,

for ε → 0, is a memoryless strategy. This result parallels with the celebrated result of

Mertens-Neyman [MN81] for concurrent games with limit-average objectives, that states

there exist ε-optimal strategies that in limit coincide with some memoryless strategies (the

memoryless strategy correspond to the memoryless optimal strategies in the discounted

game with discount factor tends to 0). It may be noted that the memoryless strategies that

the ε-optimal strategies coincide, is itself not necessarily ε-optimal.


8.1 Strategy Complexity and Computational Complexity

In this section we construct witnesses for perennial ε-optimal strategies. The

construction is based on a reduction to qualitative analysis.

Reduction to qualitative witness. Recall that a value class VC(Φ, r) is the set of states

s such that the value for player 1 is r. That is for an objective Φ, we have VC(Φ, r) = s |

Val1(Φ)(s) = r. By VC(Φ, < r) we denote the set s | Val1(Φ)(s) < r and similarly we

use VC(Φ, > r) to denote the set s | Val1(Φ)(s) > r. Intuitively, we can picture the game

as a “quilt” of value classes. Two of the value classes correspond to values 1 (player 1 wins

with probability arbitrarily close to 1) and 0 (player 2 wins with probability arbitrarily close

to 1); the other value classes correspond to intermediate values. We construct a witness

for ε-optimal strategies in a piece-meal fashion. We first show that we can construct,

for each intermediate value class, a strategy that with probability arbitrarily close to 1

guarantees either leaving the class, or winning without leaving the class. Such a strategy

can be constructed using results from [dAH00], and has a concise witness. Second, we show

that the above strategy can be constructed so that when the class is left, it is left via a

locally ε-optimal selector. By stitching together the strategies constructed in this fashion

for the various value classes, we will obtain a single witness for the complete game. The

construction of a strategy in a value class relies on a reduction. We present few notations

and then the reduction.

Value class notations. Let G be a game graph with a parity objective Φ = Parity(p).

For a state s we define the set of allowable supports

OptSupp(s) = γ ⊆ Γ1(s) | ∃ξℓ1 ∈ Λℓ(Φ). Supp(ξℓ

1) = γ,

to be the set of supports of locally optimal selectors. For every s ∈ S, we assume that we

have a fixed way to enumerate OptSupp(s) = γ1, γ2, . . . , γk. For a state s ∈ VC(Φ, r) and


γ ⊆ Γ1(s), we define the following sets of move pairs: let B = Γ1(s) \ γ,

Eq(s, γ) = (a1, a2) ∈ B × Γ2(s) | Succ(s, a1, a2) ⊆ VC(Φ, r);

Neg(s, γ) = (a1, a2) ∈ B × Γ2(s) |∑

t∈S

δ(s, a1, a2)(t) · Val1(Φ)(t) < Val1(Φ)(s);

Pos(s, γ) = (a1, a2) ∈ B × Γ2(s) | Succ(s, a1, a2) ∩ (S \ VC(Φ, r)) 6= ∅; and∑

t∈S

δ(s, a1, a2)(t) · Val1(Φ)(t) ≥ Val1(Φ)(s).

Observe that (γ × Γ2(s),Eq(s, γ),Pos(s, γ),Neg(s, γ)) forms a partition of Γ1(s) × Γ2(s).

Reduction. Let G = (S,A,Γ1,Γ2, δ) be a concurrent game structure with parity objective

Φ = Parity(p) for player 1. Let ξ associate a locally optimal selector to γ ∈ OptSupp(s), i.e.,

we have ξγ(s) = ξ(s) such that ξ ∈ Λℓ(Φ) and Supp(ξ(s)) = γ. Given the game structure G,

the priority function p, the set of selectors ξ, and a value class V = VC(Φ, r), we construct

a game graph G = (S, A, Γ1, Γ2, δ) with a priority function p as follows.

1. State space. Given a state s, let OptSupp(s) = γ1, γ2, . . . , γk. Then we have

S = s | s ∈ V ∪ w1, w2 ∪ (s, i) | s ∈ V, i ∈ 1, 2, . . . , |OptSupp(s)|.

2. Priority function.

(a) p(s) = p(s) for all s ∈ V .

(b) p((s, i)) = p(s) for all (s, i) ∈ S.

(c) p(w1) = 0 and p(w2) = 1.

3. Moves assignment.

(a) Γ1(s) = 1, 2, . . . , |OptSupp(s)| and Γ2(s) = a2. Note that every s ∈ S is a

player-1 turn-based state.


(b) Γ1((s, i)) = i ∪ (Γ1(s) \ γi) for i ∈ 1, 2, . . . , k where OptSupp(s) =

γ1, γ2, . . . , γk, and Γ2((s, i)) = Γ2(s). At state (s, i) all the moves in γi are

collapsed to one move i and all the moves not in γi are still available.

4. Transition function.

(a) The states w1 and w2 are absorbing states. Observe that player 1 have value 1

at state w1 and value 0 at state w2 for the parity objective Parity(p).

(b) For all states s ∈ S we have δ(s, i, a2)((s, i)) = 1. Hence at state s player 1 can

decide which element in OptSupp(s) to play and if player 1 chooses move i the

game proceeds to state (s, i).

(c) Transition function at state (s, i).

i. (Case 1.) For a move a2 ∈ Γ2(s), if there is a move a1 ∈ γi such that

Succ(s, a1, a2)∩(S\V ) 6= ∅, then δ((s, i), i, a2)(w1) = 1. The above transition

specifies that for a move a2 for player 2, if there is a move a1 ∈ γi for player 1

such that the game G proceeds to a state not in V with positive probability,

then in G the game proceeds to the state w1, which has value 1 for player 1,

with probability 1.

ii. (Case 2.) For move a2 ∈ Γ2(s), if for every move a1 ∈ γi we have

Succ(s, a1, a2) ⊆ V , then

δ((s, i), i, a2)(s′) =

∑

a1∈γi

ξγi(s)(a1) · δ(s, a1, a2)(s

′); for s′ ∈ V .

iii. (Case 3.) For move pairs (a1, a2) ∈ Eq(s, γi) we have

δ((s, i), a1, a2)(s′) = δ(s, a1, a2)(s

′); for s′ ∈ V.

iv. (Case 4.) For move pairs (a1, a2) ∈ Pos(s, γi) we have

δ((s, i), a1, a2)(w1) = 1.


v. (Case 5.) For move pairs (a1, a2) ∈ Neg(s, γi) we have

δ((s, i), a1, a2)(w2) = 1.

We use the following notation for the reduction: for a value class VC(Φ, r) we write

(G, p) = VQR(G, r, ξ, p)

to denote the reduction.

Proposition 12 Let G be a concurrent game structure with a parity objective Φ =

Parity(p). For r > 0, consider the value-class VC(Φ, r), and (G, p) = VQR(G, r, ξ, p).

Consider the event

A =

∞⋃

j=1

Xj = (s, i),Y1,j = a1,Y2,j = a2 | (s, i) ∈ S, (a1, a2) ∈ Neg(s, γi).

Let s ∈ V and consider the state s. For all strategies σ and π in G we have

Preσ,eπes (Reach(w2) = Preσ,eπ

es (A).

Proof. We first observe that given a state s ∈ VC(Φ, r), for the state s, for all a1 ∈ Γ1(s)

and a2 ∈ Γ2(s) we have Succ(s, a1, a2) ⊆ (s, i) | i ∈ 1, 2, . . . , |OptSupp(s)|. We now

consider the following case analysis.

1. If player 1 plays move i at state (s, i), then since γi ∈ OptSupp(s), for all moves

a1 ∈ γi, and all moves a2 ∈ Γ2(s), we have either (a) Succ(s, a1, a2) ⊆ VC(Φ, r); or

(b) Succ(s, a1, a2)∩VC(Φ, > r) 6= ∅. Hence for the move i, at the player 1 at state (s, i),

for all moves a2 ∈ Γ2((s, i)) = Γ2(s), we have (a) if Succ(s, a1, a2) ⊆ VC(r), for all

a1 ∈ γi, then Succ((s, i), i, a2) ⊆ Sr\w1, w2; (b) else Succ(s, a1, a2)∩VC(Φ, > r) 6= ∅,

for some a1 ∈ γi, then Succ((s, i), i, a2) = w1. That is for all moves a2 ∈ Γ2(s) we

have Succ((s, i), i, a2) ⊆ S \ w2.


2. For move pairs (a1, a2) ∈ Eq(s, γi), we have Succ((s, i), a1, a2) ⊆ S \ w1, w2.

3. For move pairs (a1, a2) ∈ Pos(s, γi), we have Succ((s, i), a1, a2) = w1.

4. For move pairs (a1, a2) ∈ Neg(s, γi), we have Succ((s, i), a1, a2) = w2.

It follows that the probability of reaching w2 is the probability of the event A. Hence the

result follows.

Strategy maps. We consider the reduction (G, p) = VQR(G, r, ξ, p), for r > 0, and define

two strategy maps below.

1. Given a strategy σε in the game structure G we construct a projected strategy σε =

t1(σε) in the game G as follows:

• σε(s0, (s0, i0), s1, (s1, i1), . . . , (sk−1, ik−1), sk)(j) = 1 if and only if γj =

arg maxγ∈OptSupp(sk)

∑a∈γ σε(s0, s1, . . . , sk)(a).

• σε(s0, (s0, i0), s1, (s1, i1), . . . , sk, (sk, j))(j) =∑

a∈γjσε(s0, s1, . . . , sk)(a) and

for all a′ 6∈ γj we have σε(s0, (s0, i0), s1, (s1, i1), . . . , sk, (sk, j))(a′) =

σε(s0, s1, . . . , sk)(a′).

2. Given a strategy σε in the game structure G and a strategy π in G, we define a

strategy π = t2(σε, π) in the game structure G as follows:

• π(s0, s1, . . . , sk) = π(s0, (s0, i0), s1, (s1, i1), . . . , sk)) such that for all 0 ≤ l ≤ k,

we have σε(s0, (s0, i0), s1, (s1, i1), . . . , sl))(il) = 1, where σε = t1(σε).

Lemma 46 Let G be a concurrent game structure with a parity objective Φ = Parity(p).

For r > 0, consider (G, p) = VQR(G, r, ξ, p). There exists a constant c, such that for all

ε > 0, for all locally ε-optimal strategies σε in G, for all states s ∈ S\w2, for all strategies

π in G, we have Preσε,eπes (Reach(w2)) ≤ c · ε, where σε = t1(σε).


Proof. For ε > 0, consider a locally ε-optimal strategy σε. Let

c1 = minVal1(Φ)(s) − ∑t∈S Val1(Φ)(t) · δ(s, a1, a2)(t) | s ∈ S, a1 ∈ Γ1(s), a2 ∈ Γ2(s),

Val1(Φ)(s) − ∑t∈S Val1(Φ)(t) · δ(s, a1, a2)(t) > 0 > 0.

Since σε is a locally ε-optimal strategy, it follows that the strategy σε = t1(σε) satisfies

that at every round j, at state (s, i), the move pairs (a1, a2) ∈ Neg(s, γi) is played with

probability at most εj , with∑∞

j=1 εj ≤ 1c1

· ε. With c = 1c1

, we obtain that the probability

of the event A (as defined in Proposition 12) is at most c · ε. The result then follows from

Proposition 12.


For r > 0, consider (G, p) = VQR(G, r, ξ, p). For all states s ∈ VC(Φ, r) the following

assertions hold.

1. There exists a constant c such that for all locally ε-optimal and perennial ε-optimal

strategies σε ∈ ΣPLε (Φ) ∩ Σℓ

ε(Φ) in G, with 0 < ε < r2 , for all strategies π in G we

have Preσε,eπes (Φ) ≥ 1 − c · ε, where Φ = Parity(p), and σε = t1(σε).

2. The state s is a limit-sure winning state in G for objective Φ = Parity(p); i.e., s |

s ∈ VC(Φ, r) ⊆ LimiteG1 (Parity(p)).

Proof. We prove both the cases below.

1. For 0 < ε < r2 , consider a locally ε-optimal and perennial ε-optimal strategy σε.

Consider a strategy π in G. We construct an extended strategy π for player 2 in G as

follows: π = t2(σε, π). Since σε is a perennial ε-optimal strategy, it follows that for

all histories 〈s0, s1, . . . , sn〉 such that for all 0 ≤ i ≤ n, si ∈ VC(Φ, r), we have

Prσε,πs (Φ | 〈s0, s1, . . . , sn〉) ≥ r − ε ≥ r

2> 0.

It follows that for all histories 〈s0, s1, . . . , sn〉 we have Prσε,πs (Φ∪Reach(S \VC(Φ, r)) |

〈s0, s1, . . . , sn〉) ≥ r−ε ≥ r2 > 0, i.e., for all n we have Prσε,π

s (Φ∪Reach(S\VC(Φ, r)) |


Fn) ≥ r−ε ≥ r2 > 0. It follows from Lemma 3 that Prσε,π

s (Φ∪Reach(S \VC(Φ, r))) =

1. Then by construction we obtain that Preσε,eπes (Φ ∪ Reach(w1, w2)) = 1. Since

σε is a locally ε-optimal strategy, by Lemma 46, there exists a constant c such that

Preσε,eπes (Reach(w2)) ≤ c·ε, and hence we have Preσε,eπ

es (Φ∪Reach(w1)) = Preσε,eπes (Φ) ≥

1 − c · ε. The result follows.

2. By Lemma 5 and Proposition 6 it follows that for all ε > 0, we have ΣPLε (Φ)∩Σℓ

ε(Φ) 6=

∅. The desired result then follows from part (1).

Limit-sure witness [dAH00]. The witness strategy σ for limit-sure winning games con-

structed in [dAH00] consists of the following parts: a ranking function of the states, and

a ranking function of the actions at a state. The ranking functions were described by a

µ-calculus formula. The witness strategy σ at round k of a play, at a state s, plays the

actions of the least rank at s with positive-bounded probabilities and other actions with

vanishingly small probabilities (as function of ε), in appropriate proportion as described by

the ranking function. Hence, the strategy σ can be described as

σ = (1 − εk)σℓ + εk · σd(εk),

where σℓ is any selector with ξ such that Supp(ξ) is the set actions with least rank, and

σd(εk) denotes a selector with Supp(σd(εk)) = Γ1 \ Supp(σℓ). Hence the strategy σ plays

the moves in Supp(σd(εk)) with vanishingly small probability as εk → 0. We denote by

limit-sure witness move set the set of actions with the least rank, i.e., Supp(σℓ). It follows

from the above construction that as ε → 0, the limit-sure winning strategy σ converges to

the memoryless selector σℓ, i.e., the limit of the limit-sure witness strategy is a memoryless

strategy. The following lemma follows from the limit-sure witness strategy construction

in [dAH00]. Lemma 49 is also a direct consequence of the results of [dAH00]. Lemma 49

states that the set of limit-sure winning states of a concurrent game structure is independent


of the precise transition probabilities of transition function and depends only on the support

of the transition function.


For r > 0, consider (G, p) = VQR(G, r, ξ, p). For every state s in G there is a pure

memoryless move j for player 1 and a limit-sure winning strategy σ such that Supp(σ)(s) =

j and the limit-sure witness move set at (s, j) = j.

Proof. The existence of pure memoryless move is a consequence of the fact that every state

s is a player-1 turn-based state and the witness construction in [dAH00].

Lemma 49 Let G1 = (S,A,Γ1,Γ2, δ1) and G2 = (S,A,Γ1,Γ2, δ2) be two concurrent game

structures with the same set S of states, same set A of moves, and same move assign-

ment functions Γ1 and Γ2. If for all s ∈ S, for all a1 ∈ Γ1(s) and a2 ∈ Γ2(s) we have

Supp(δ1(s, a1, a2)) = Supp(δ2(s, a1, a2)), then for all parity objectives Φ, the set of limit-

sure winning states in G1 and G2 coincide, i.e., LimitG11 (Φ) = LimitG2

1 (Φ).

Simplified construction. From Lemma 48 we conclude that in Lemma 47 it is possible

to restrict that every state s has a single successor and still every state s in G is limit-sure

winning. From Lemma 49 we conclude that for the selectors ξ of Lemma 47, the precise

transition probabilities do not matter, and only the support matters. We will formalize

the result in Lemma 50. We first present another reduction that is not restricted to value

classes. The reduction is similar to the reduction VQR, but it takes a subset of moves for

every state specified by a function f , and a partition of moves specified by Eq,Pos, and

Neg. Given a game graph G, a priority function p, let V ⊆ S be a subset of states. Let

f : S → 2A\∅ be a function such that f(s) ⊆ Γ1(s) for all s ∈ S, and let ξf be a selector such

that for all s ∈ S we have Supp(ξf (s)) = f(s). For s ∈ V , let (Eq(s, f),Pos(s, f),Neg(s, f))

define partition of the move set (Γ1(s)\f(s))×Γ2(s) such that for all (a1, a2) ∈ Eq(s, f) we


have Succ(s, a1, a2) ⊆ V . We construct a game graph G = (S, A, Γ1, Γ2, δ) with a priority

function p as follows.

1. State space. The state space is as follows: S = V ∪ w1, w2.

2. Priority function.

(a) p(s) = p(s) for all s ∈ V .

(b) p(w1) = 0 and p(w2) = 1.

3. Moves assignment.

(a) Γ1(s) = 1 ∪ (Γ1(s) \ f(s)) and Γ2(s) = Γ2(s). At state s ∈ S all the moves

in f(s) are collapsed to single move 1 and all the moves not in f(s) are still

available.

4. Transition function.

(a) The states w1 and w2 are absorbing states. Observe that player 1 have value 1

at state w1 and value 0 at state w2 for the parity objective Parity(p).

(b) Transition function at state s.

i. (Case 1.) For a move a2 ∈ Γ2(s), if there is a move a1 ∈ f(s) such that

Succ(s, a1, a2) ∩ (S \ V ) 6= ∅, then δ(s, 1, a2)(w1) = 1. The above transition

specifies that for a move a2 for player 2, if there is a move a1 ∈ f(s) for

player 1 such that the game G proceeds to a state not in V with positive

probability, then in G the game proceeds to the state w1, which has value 1

for player 1, with probability 1.

ii. (Case 2.) For a move a2 ∈ Γ2(s), if for every move a1 ∈ f(s) we have

Succ(s, a1, a2) ⊆ V , then

δ(s, 1, a2)(s′) =

∑

a1∈f(s)

ξf (s)(a1) · δ(s, a1, a2)(s′); for s′ ∈ V .


iii. (Case 3.) For move pairs (a1, a2) ∈ Eq(s, f) we have

δ(s, a1, a2)(s′) = δ(s, a1, a2)(s

′); for s′ ∈ V.

iv. (Case 4.) For move pairs (a1, a2) ∈ Pos(s, f) we have

δ(s, a1, a2)(w1) = 1.

v. (Case 5.) For move pairs (a1, a2) ∈ Neg(s, f) we have

δ(s, a1, a2)(w2) = 1.

The main difference to the reduction VQR is as follows: the function f (replacing

OptSupp(s)) chooses a single subset of moves, and the functions Eq,Pos, and Neg are

given. We refer to this reduction as follows:

(G, p) = QRS(G,V, f, ξf ,Eq,Pos,Neg, p).

From Lemma 47, Lemma 48, and Lemma 49 we obtain Lemma 50.


There is a function f : S → 2A \ ∅ such that for all s ∈ S we have f(s) ∈ OptSupp(s), and

the following assertion hold. For a value class V = VC(Φ, r), for r > 0, let

Eq(s, f) = Eq(s, f(s)); Pos(s, f) = Pos(s, f(s)); Neg(s, f) = Neg(s, f(s));

for s ∈ V , where Eq,Pos, and Neg are as defined for value classes. For all selec-

tors ξf such that for all s ∈ S we have Supp(ξf (s)) = f(s), in the game (G, p) =

QRS(G,V, f, ξf ,Eq,Pos,Neg, p) every state s is limit-sure winning for the objective Φ =

Parity(p), i.e., V ⊆ LimiteG1 (Parity(p)).

Given a concurrent game structure G, and a strategy that ensures that certain

action pairs are played with very small probabilities, we obtain an bound on the probability


on reaching a set of states by considering an MDP. The construction of such an MDP is

described below.

MDP construction for partitions. Given a concurrent game structure G =

(S,A,Γ1,Γ2, δ), let T ⊆ S be a subset of states. Let

1. P = (V0, V1, . . . , Vk) be a partition of S;

2. f : S → 2A \ ∅ be a function such that f(s) ⊆ Γ1(s) for all s ∈ S; and

3. ξf be a selector such that for all s ∈ S we have Supp(ξf (s)) = f(s).

For s ∈ Vi, let (Eq(s, f),Pos(s, f),Neg(s, f)) define partition of (Γ1(s) \ f(s)) × Γ2(s) such

that for all (a1, a2) ∈ Eq(s, f) we have Succ(s, a1, a2) ⊆ Vi. We now consider a player 2

MDP G = (S, A, Γ2, δ) as follows:

1. Γ2(s) = 1 × Γ2(s) ∪ Eq(s, f) ∪ Pos(s, f).

2. δ(s, (a1, a2))(t) = δ(s, a1, a2)(t), for t ∈ S and (a1, a2) ∈ Eq(s, f) ∪ Pos(s, f), and

δ(s, (1, a2))(t) =∑

a1∈f(s) δ(s, a1, a2)(t) · ξf (s)(a1), for t ∈ S and a2 ∈ Γ2(s).

3. A = ∪s∈SΓ2(s).

Intuitively, player 2 can choose moves in Eq(s, f)∪Pos(s, f), and if player 2 decides to play

(1, a2), then the process of playing the selector ξf with the move a2 is mimiced. We will

use the following notations:

G = M(G, f,P, ξf ,Eq,Pos,Neg);

v(G, f,P, ξf ,Eq,Pos,Neg, T )(s) = supbπ∈bΠ Prbπs (Reach(T )),

i.e., we denote the MDP construction as M and v(G, f,P, ξf ,Eq,Pos,Neg, T )(s) denotes the

value at s in MDP G to reach T . The following lemma shows that if a strategy plays action

pairs in Neg(s) with very small probabilities, and maintain the ratio of probabilities of f(s)


as specified by ξf , then the maximal probability to reach T is bounded approximately by

the maximal value to reach T in G.

Lemma 51 Let G = (S,A,Γ1,Γ2, δ) be a concurrent game structure and T ⊆ S. Let

P = (V0, V1, . . . , Vk) be a partition of S, f : S → 2A \ ∅ a function such that f(s) ⊆ Γ1(s)

for all s ∈ S, ξf a selector such that for all s ∈ S we have Supp(ξf (s)) = f(s). For s ∈ Vi,

let (Eq(s, f),Pos(s, f),Neg(s, f)) define partition of (Γ1(s) \ f(s)) × Γ2(s) such that for all

(a1, a2) ∈ Eq(s, f) we have Succ(s, a1, a2) ⊆ Vi. Let σ be a strategy such that the following

condition holds: for all histories w ∈ S∗ and all s ∈ S

σ(w · s)(a)∑a1∈f(s) σ(w · s)(a1)

= ξf (s)(a); for all a ∈ f(s),

i.e., the proportion of the probability of actions in f(s) are same as in ξf (s). Consider the

event

A =⋃

s∈S

∞⋃

j=1

Xj = s, (Y1,j,Y2,j) ∈ Neg(s, f).

For ε > 0, for a strategy π for player 2 and a state s ∈ S,

if Prσ,πs (A) ≤ ε, then Prσ,π

s (Reach(T )) ≤ v(G, f,P, ξf ,Eq,Pos,Neg, T )(s) + ε.

Proof. Let v(s) = v(G, f,P, ξf ,Eq,Pos,Neg, T )(s). Let σ be a strategy such that the

following condition holds: for all histories w ∈ S∗ and all s ∈ S

σ(w · s)(a)∑a1∈f(s) σ(w · s)(a1)

= ξf (s)(a); for all a ∈ f(s).

Then by the construction of the MDP G = M(G, f,P, ξf ,Eq,Pos,Neg), it follows that for

all s ∈ S and all π ∈ Π we have

Prσ,πs (Reach(T ) | A) ≤ sup

bπ∈bΠPrbπ

s (Reach(T )) = v(s),

where A is the complement of event A, i.e., given the event A that ensures that action

pairs in Neg(s, f) are never played, the probability to reach T is bounded by the maximal


probability to reach T in G. Hence for all s ∈ S and all π ∈ Π, if Prσ,πs (A) ≤ ε, then we

have

Prσ,πs (Reach(T )) = Prσ,π

s (Reach(T ) | A) · Prσ,πs (A) + Prσ,π

s (Reach(T ) | A) · Prσ,πs (A)

≤ ε + v(s).


The following two lemmas relate the value of a concurrent game structure with a

parity objective, with qualitative winning in partitions of the state space and reachability

to the limit-sure winning set.

Lemma 52 Given a concurrent game structure G with a parity objective Parity(p), let

W1 = Limit1(Parity(p)) and W2 = Limit2(coParity(p)). Let P = (V0, V1, . . . , Vk) be a par-

tition of S, f : S → 2A \∅ a function such that f(s) ⊆ Γ1(s) for all s ∈ S, ξf a selector such

that for all s ∈ S we have Supp(ξf (s)) = f(s). For s ∈ Vi, let (Eq(s, f),Pos(s, f),Neg(s, f))

define partition of (Γ1(s) \ f(s)) × Γ2(s) such that for all (a1, a2) ∈ Eq(s, f) we have

Succ(s, a1, a2) ⊆ Vi. Suppose the following conditions hold.

1. Assumption 1. V0 = W1, Vk = W2.

2. Assumption 2. For all 1 ≤ i ≤ k − 1, for all s ∈ Vi

• for all a2 ∈ Γ2(s), if Succ(s, ξf , a2)∩(S\Vi) 6= ∅, then Succ(s, ξf , a2)∩(∪j<iVj) 6=

∅; and

• Succ(s, a1, a2) ∩ (∪j<iVj) 6= ∅, for all (a1, a2) ∈ Pos(s, f).

3. Assumption 3. For all 1 ≤ i ≤ k − 1, every state s ∈ Vi is limit-sure winning

in Gi for objective Parity(pi), where (Gi, pi) = QRS(G,Vi, f, ξf ,Eq,Pos,Neg, p); i.e.,

Vi ⊆ LimiteGi

1 (Parity(pi)).

Then for all s ∈ S we have Val2(coParity(p))(s) ≤ v(G, f,P, ξf ,Eq,Pos,Neg,W2)(s).


Proof. Given ε > 0, let σ0ε be a strategy such that for all s ∈ V0 and all strategies π

we have Prσ0

ε ,πs (Parity(p)) ≥ 1 − ε; (such a strategy exists, since by assumption 1 we have

V0 = W1 = Limit1(Parity(p)). Given ε > 0, for 1 ≤ i ≤ k − 1, let σiε be a strategy in Gi

such that for all s ∈ Vi and all strategies π in Gi we have Prσi

ε,πs (Parity(pi)) ≥ 1 − ε; (such

a strategy exists by assumption 3). Given ε > 0, fix a sequence ε1, ε2, . . ., such that for

all j ≥ 1 we have εj > 0 and∑∞

j=1 εj ≤ ε (e.g., set εj = ε2j ). We construct a strategy σε

as follows. For a history w = 〈s0, s1, . . . , sℓ〉 let us inductively define num(w) as follows:

num(〈s0〉) = 1, and

num(〈s0, s1, . . . , sℓ−1, sℓ〉) =

num(〈s0, s1, . . . , sℓ−1〉) if sℓ−1, sℓ ∈ Vi, for some i;

num(〈s0, s1, . . . , sℓ−1〉) + 1 if sℓ−1 ∈ Vi, sℓ ∈ Vj , Vi 6= Vj.

That is num(w) denotes the number of switches of the partitions for w. The strategy σε

follows the strategy σ0ε upon reaching V0, and the strategy played on reaching Vk is irrelevant

(i.e., can be fixed arbitrarily). For a history w = 〈s0, s1, . . . , sℓ〉, such that for all 0 ≤ j ≤ ℓ,

sj ∈ ∪1≤l≤k−1Vl, the strategy σε(w) is defined as follows: for a ∈ Γ1(s) \ f(s) we have

σε(w)(a) =

σiε1

(w)(a) if s0, . . . , sℓ ∈ Vi;

σiεl

(〈sj , . . . , sℓ〉)(a) otherwise, where sj, . . . , sℓ ∈ Vi, sj−1 6∈ Vi, num(w) = l;

and for a ∈ f(s) we have

σε(w)(a) =

σiε1

(w)(1) · ξf (sℓ)(a) if s0, . . . , sℓ ∈ Vi;

σiεl

(〈sj , . . . , sℓ〉)(1) · ξf (sk)(a) otherwise, where sj, . . . , sℓ ∈ Vi, sj−1 6∈ Vi,

num(w) = l.

The strategy σε, on entering a set Vi ignores the history and switches to the strategy σiεl

,

for histories w with num(w) = l.


The following assertion hold: for all s ∈ S, for all strategies π, for all histories

w = 〈s0, s1, . . . , sℓ〉 with sℓ ∈ Vi, for 1 ≤ i ≤ k − 1, we have

Prσε,πs (Parity(p) | A) ≥ 1 − εl ≥ 1 − ε (†);

where A = w · ω | ω ∈ Safe(Vi), and num(w) = l. The above property follows since

the strategy σε switches to a strategy σiεl

, and the strategy σiεl

ensures that if the game

stays Vi forever, then Parity(p) holds with probability at least 1 − εl. We now analyze the

game structures Gi, for 1 ≤ i ≤ k − 1. In Gi, for s ∈ Vi and (a1, a2) ∈ Neg(s, f), we

have δ(s, a1, a2)(w2) = 1, and w2 is sure-winning for player 2. Since σiεl

ensures winning

with probability 1 − εl in Vi, it follows that for all strategies π, the action pairs from

the set Neg(s, f) is played with total probability less than εl over all rounds of play, for

histories w with num(w) = l. For (a1, a2) ∈ Eq(s, f), we have Supp(δ(s, a1, a2)) ⊆ Vi. By

assumption 3 the following conditions hold: (a) for all a2 ∈ Γ2(s), if Succ(s, ξf , a2) ∩ (S \

Vi) 6= ∅, then Succ(s, ξf , a2)) ∩ (∪j<iVj) 6= ∅; and (b) Succ(s, a1, a2) ∩ (∪j<iVj) 6= ∅, for all

(a1, a2) ∈ Pos(s, f). Thus we obtain that for all s ∈ S, for all strategies π, for all histories

w = 〈s0, s1, . . . , sℓ〉 with sℓ ∈ Vi, for 1 ≤ i ≤ k − 1, we have

Prσε,πs (Reach(∪j<iVj) | B) ≥ c > 0,

for some constant c, where B = w · ω | ω ∈ Reach(S \ Vi). Since there are k-partitions

we obtain that for all s ∈ S, for all strategies π, for all histories w = 〈s0, s1, . . . , sℓ〉 with

sℓ ∈ Vi, for 1 ≤ i ≤ k − 1, we have

Prσε,πs (Reach(V0) | B) ≥ c1 > 0, (‡)

for some constant c1, with 0 < c1 ≤ ck. For all s ∈ S, for all strategies π, for all histories

w = 〈s0, s1, . . . , sℓ〉 with sℓ ∈ V0 or sℓ ∈ Vk, we have

Prσε,πs (Reach(V0 ∪ Vk) | w) = 1 (§)


Hence it follows from (†), (‡) and (§) that for all s ∈ S, for all strategies π, for all n > 0,

we have

Prσε,πs (Parity(p) ∪ Reach(V0 ∪ Vk) | Fn) ≥ c1 > 0,

for some constant c1. By Lemma 3 for all s ∈ S, for all strategies π, we have

Prσε,πs (Parity(p) ∪ Reach(V0 ∪ Vk)) = 1.

Since σε plays σ0ε upon reaching V0, it follows that for all s ∈ S and all strate-

gies π we have Prσε,πs (coParity(p) | Reach(V0)) ≤ ε. Since the strategy σε ensures that

action pairs in Neg(s, f) is played with probability at most ε, by Lemma 51 we obtain

that for all s ∈ S and all strategies π we have Prσε,πs (Reach(W2)) ≤ v(s) + ε, where

v(s) = v(G, f,P, ξf ,Eq,Pos,Neg,W2)(s) + ε. It follows that for all s ∈ S and all strategies

π we have

Prσε,πs (coParity(p)) ≤ Prσε,π

s (Reach(W2)) + ε ≤ (v(s) + ε) + ε = v(s) + 2 · ε.

Since ε > 0 is arbitrary, the desired result follows.

Lemma 53 Given a concurrent game structure G with a parity objective Parity(p), let

W1 = Limit1(Parity(p)) and W2 = Limit2(coParity(p)). There exists a partition P =

(V0, V1, . . . , Vk) of S, a function f : S → 2A \ ∅ such that f(s) ⊆ Γ1(s) for all s ∈ S, a

selector ξf such that for all s ∈ S we have Supp(ξf (s)) = f(s), and the following conditions

hold.

1. Condition 1. V0 = W1, Vk = W2.

2. Condition 2. For all 1 ≤ i ≤ k − 1, for all s ∈ Vi, there exists a partition

(Eq(s, f),Pos(s, f),Neg(s, f)) of (Γ1(s) \ f(s)) × Γ2(s) such that for all (a1, a2) ∈

Eq(s, f) we have Succ(s, a1, a2) ⊆ Vi, and the following assertions hold:


• for all a2 ∈ Γ2(s), if Succ(s, ξf , a2)∩(S\Vi) 6= ∅, then Succ(s, ξf , a2)∩(∪j<iVj) 6=

∅; and

• Succ(s, a1, a2) ∩ (∪j<iVj) 6= ∅, for all (a1, a2) ∈ Pos(s, f).

3. Condition 3. For all 1 ≤ i ≤ k − 1, every state s ∈ Vi is limit-sure winning in Gi

for objective Parity(pi), where (Gi, pi) = QRS(G,Vi, f, ξf ,Eq,Pos,Neg, p), i.e., Vi ⊆

LimiteGi

1 (Parity(pi)).

4. Condition 4. For all s ∈ S we have v(G, f,P, ξf ,Eq,Pos,Neg,W2)(s) ≤

Val2(coParity(p))(s).

Proof. The witness partition P, the function f , the selector ξf , and the partitions Eq,Pos

and Neg are obtained as follows. The partition P is the value-class partition of S in

decreasing order, i.e., (a) for 0 ≤ i ≤ k, the set Vi is a value-class; and (b) for s ∈ Vi

and t ∈ Vj , if j < i, then Val1(Parity(p))(t) > Val1(Parity(p))(s). The witness f is

obtained as a witness satisfying the conditions of Lemma 50 such that for all s ∈ S we have

f(s) ∈ OptSupp(s). For a state s ∈ Vi, with 1 ≤ i ≤ k − 1, we have

Eq(s, f) = Eq(s, f(s)); Pos(s, f) = Pos(s, f(s)); Neg(s, f) = Neg(s, f(s));

where Eq,Pos, and Neg are as defined for value classes. By Lemma 50 we obtain

that for all 1 ≤ i ≤ k − 1, we have Vi ⊆ LimiteGi

1 (Parity(pi)), where (Gi, pi) =

QRS(G,Vi, f, ξf ,Eq,Pos,Neg, p). The witness selector ξf is a locally optimal selector such

that Supp(ξf (s)) = f(s) for all s ∈ S. Since ξf is a locally optimal selector, it follows that

for all s ∈ S ∩Vi, with 1 ≤ i ≤ k−1, and for all a2 ∈ Γ2(s), if Succ(s, ξf , a2)∩ (⋃

j>i Vj) 6= ∅

(i.e., it goes to a lower value-class), then Succ(s, ξf , a2) ∩ (⋃

j<i Vj) 6= ∅ (i.e., it goes to a

higher value-class). In other words, for all s ∈ S ∩ Vi, with 1 ≤ i ≤ k − 1, and for all

a2 ∈ Γ2(s), if Succ(s, ξf , a2) ∩ (S \ Vi) 6= ∅, then Succ(s, ξf , a2) ∩ (⋃

j<i Vj) 6= ∅. For s ∈ Vi,

with 1 ≤ i ≤ k − 1, for (a1, a2) ∈ Eq(s, f(s)) we have Succ(s, a1, a2) ⊆ Vi. For s ∈ Vi, with


1 ≤ i ≤ k − 1, for (a1, a2) ∈ Pos(s, f(s)) we have Succ(s, a1, a2) ∩ (S \ Vi) 6= ∅, and

∑

t∈S

Val1(Parity(p))(t) · δ(s, a1, a2)(t) ≥ Val1(Parity(p))(s).

Since Succ(s, a1, a2) ∩ (S \ Vi) 6= ∅, we must have Succ(s, a1, a2) ∩ VC(Parity(p), > r) 6= ∅,

where s ∈ VC(Parity(p), r), i.e., Succ(s, a1, a2) ∩ (∪j<iVi) 6= ∅. It follows that condition 1,

condition 2, and condition 3 holds. We now prove condition 4.

Let v(s) = Val2(coParity(p))(s) for s ∈ S. Observe that v(s) = 1 for all

s ∈ W2. Hence to show the desired result, it suffices to show that in the MDP

G = M(G, f,P, ξf ,Eq,Pos,Neg), for all states s and all a ∈ Γ2(s) we have

v(s) ≥∑

t∈S

v(t) · δ(s, a)(t).

The inequality is proved considering the following cases.

• For s ∈ S and a = (a1, a2) ∈ Eq(s, f(s)), for all t ∈ Succ(s, a1, a2) we have v(s) = v(t)

(as s and t are in the same value-class). It follows that v(s) =∑

t∈S v(t) · δ(s, a)(t).

• For s ∈ S and a = (a1, a2) ∈ Pos(s, f(s)) we have

(1 − v(s)) ≤∑

t∈S

(1 − v(t)) · δ(s, a1, a2)(t),

i.e., v(s) ≥ ∑t∈S v(t) · δ(s, a1, a2)(t). In other words we have v(s) ≥ ∑

t∈S v(t) ·

δ(s, a)(t).

• For s ∈ S and a = (1, a2) we have

∑

t∈S

v(t) · δ(s, a)(t) =∑

t∈S

∑

a∈f(s)

v(t) · δ(s, a1, a2)(t) · ξf (s)(a1).

Since ξf is a locally optimal selector, for all s ∈ S and for all a2 ∈ Γ2(s) we have

v(s) ≥∑

t∈S

∑

a1∈f(s)

v(t) · δ(s, a1, a2)(t) · ξf (s)(a1).

Thus we have v(s) ≥ ∑t∈S v(t) · δ(s, a)(t).


Hence the desired result follows.

Algorithm. Given a concurrent game structure G, a parity objective Parity(p), a real α,

and a state s, to decide whether Val1(Parity(p))(s) ≥ α, it is sufficient (and possible by

Lemma 53) to guess (P, f, ξ,Eq,Pos,Neg) such that P = (V0, V1, . . . , Vk) is a partition of

the state space, f : S → 2A \ ∅ a function such that f(s) ⊆ Γ1(s) for all s ∈ S, ξ a selector

such that for all s ∈ S we have Supp(ξ(s)) = f(s), and the following conditions hold:

1. V0 = Limit1(Parity(p)), and Vk = Limit2(coParity(p));

2. for all 1 ≤ i ≤ k and all s ∈ S ∩ Vi we have

(a) for all (a1, a2) ∈ Eq(s, f) we have Succ(s, a1, s2) ⊆ Vi;

(b) for all (a1, a2) ∈ Pos(s, f) we have Succ(s, a1, a2) ∩ (∪j<iVj) 6= ∅;

(c) for all a2 ∈ Γ2(s), if Succ(s, ξ, a2)∩(S\Vi) 6= ∅, then Succ(s, ξ, a2)∩(∪j<iVj) 6= ∅;

(also observe that it suffices to verify the condition for the selector ξU that at s

plays all actions in f(s) uniformly at random, instead of the selector ξ);

3. for all 1 ≤ i ≤ k − 1, every state s is limit-sure winning in (Gi, pi) =

QRS(G,Vi, f, ξU ,Eq,Pos,Neg, p), where ξU is a selector that at a state s plays all

moves in f(s) uniformly at random; and

4. 1 − α ≥ v(f,P, ξ,Eq,Pos,Neg, Vk)(s).

Observe that in each Gi we need to verify that s is limit-sure winning, and limit-sure winning

in concurrent games is not dependent on the the precise transition probability (Lemma 50),

it is sufficient to verify the condition with ξU instead of ξ. The guess of the partition P and

f is polynomial in the size of the game, and the guess of ξ will be obtained by a sentence in

the theory of reals. Once P and f are guessed, step 1 and step 2 can be achieved in PSPACE

(since for concurrent games whether a state s ∈ Limit1(Parity(p)) can be decided in NP∩


coNP [dAH00]). We now present a sentence in the existential theory of the real closed field

(the sub-class of the real-closed field where only the existential quantifier is used) for the

guess ξ to verify the last condition:

∃x.∃v.∧

s∈S,a1∈Γ1(s)

(xs(a) ≥ 0

)∧

∧

s∈S

( ∑

a∈Γ1(s)

xs(a) = 1)

∧∧

s∈S,a1∈f(s)

(xs(a1) > 0) ∧∧

s∈S,a1 6∈f(s)

(xs(a1) = 0

)

∧∧

s∈V0

(v(s) = 0) ∧∧

s∈Vk

(v(s) = 1)

∧∧

1≤i≤k

∧

s,t∈Vi

(v(s) = v(t)) ∧∧

1≤i<j≤k

∧

s∈Vi,t∈Vj

(v(s) > v(t))

∧∧

s∈S\(V0∪Vk),a2∈Γ2(s)

(v(s) ≥

∑

t∈S

∑

a1∈f(s)

v(t) · δ(s, a1, a2)(t) · xs(a1)

)

∧∧

s∈S\(V0∪Vk),(a1,a2)∈Pos(s,f)

(v(s) ≥

∑

t∈S

v(t) · δ(s, a1, a2)(t)

)

∧ (v(s) ≤ 1 − α).

The first line of constraints ensures that x is a selector and the second line of constraints

ensures that x is a selector with support f(s) for all states s ∈ S. The third line ensures

that v(s) is defined in the right way for states in V0 and Vk. The fourth line ensures that

for all 1 ≤ i ≤ k, and s, t ∈ Vi, we have the same value in s and t, and if s ∈ Vi and

t ∈ Vj , with i < j, then the value at s is greater than the value at t. The next two lines

present the inequality constraints to guarantee that with the choice of x as selector we

have v(f,P, x,Eq,Pos,Neg, Vk) ≤ v(s). The last constraint specifies that v(s) ≤ 1 − α.

Since existential theory of reals is decidable in PSPACE [Can88], we obtain NPSPACE

algorithm to decide whether Val1(Parity(p))(s) ≥ α. Since NPSPACE=PSPACE, there is

a PSPACE algorithm to decide whether Val1(Parity(p))(s) ≥ α. By applying the binary

search technique (as for Algorithm 7) we can approximate the value to a precision ε, for

ε > 0, applying the decision procedure log(1ε ) times. Thus we have the following result.


Theorem 44 (Computational complexity) Given a concurrent game structure G and

a parity objective Parity(p), a state s of G, rational ε > 0 and a rational α the following

assertions hold.

1. (Decision problem). Whether Val1(Parity(p))(s) ≥ α can be decided in PSPACE.

2. (Approximation problem). An interval [l, u] such that u − l ≤ 2ε and

Val1(Parity(p))(s) ∈ [l, u] can be computed in PSPACE.

The previous best known algorithm to approximate values is triple exponential in

the size of the game graph and logarithmic in 1ε [dAM01].

Strategy complexity. Lemma 52 and Lemma 53 shows that witness for perennial ε-

optimal strategies can be obtained by “stitching” (or composing together) limit-sure winning

strategies and locally optimal selectors across value classes. This characterization along with

results on the structure of limit-sure winning strategies yields Theorem 45. From the results

of [dAH00] it follows that there are limit-sure winning strategies coincide in limit with a

memoryless selector σℓ such that Supp(σℓ) is the set of least-rank actions of the limit-sure

witness. The witness construction of ε-optimal strategies we presented extend the result

from limit-sure winning strategies to ε-optimal strategies (Theorem 45). Theorem 45 states

that there exist ε-optimal strategies that in limit coincide with locally optimal selector, i.e.,

a memoryless strategy with locally optimal selectors. This parallels the results of Mertens-

Neyman [MN81] for concurrent games with limit-average objectives.

Theorem 45 (Limit of ε-optimal strategies) For every ε > 0, for all parity objectives

Φ there exist ε-optimal strategy σε, such that the sequence of the strategies σε converge to

a memoryless strategy σ with locally optimal selector as ε → 0, i.e., limε→0 σε = σ, where

σ ∈ Σℓ(Φ) and σ is memoryless.


Complexity of concurrent ω-regular games. The complexity results for concurrent

games with sure winning criteria follows from the results for 2-player games. Given a

concurrent game of size |G|, and a parity objective of d-priorities, the almost-sure and

limit-sure winning states can be computed in time O(|G|d+1) and the almost-sure and

limit-sure winning states can be decided in NP ∩ coNP [dAH00]. We established that the

values of concurrent games with parity objectives can be approximated within ε-precision

in EXPTIME, for ε > 0. A concurrent game with Rabin and Streett objectives with d pairs,

can be solved by transforming it to a game of exponential size of the original game, with

parity objectives of O(d) priorities: the reduction is achieved using a index appearance record

(IAR) construction [Tho95], which is an adaptation of the LAR construction of [GH82]. The

conversion along with the qualitative analysis of concurrent games with parity objectives

shows that almost-sure and limit-sure winning states of concurrent games with Rabin and

Streett objectives can be computed in EXPTIME. Moreover, the conversion of concurrent

games with Rabin and Streett objectives to concurrent games with parity objectives and

quantitative analysis of concurrent games with parity objectives yields an EXPSPACE

bound to compute values within ε-precision for concurrent games with Rabin and Streett

objectives. We summarize the results on strategy and computational complexity in Table 8.1

and Table 8.2.

8.2 Conclusion

In this chapter we studied the complexity of concurrent games with parity objec-

tives, and as a consequence also obtained improved complexity results for concurrent games

with Rabin, Streett and Muller objectives. The interesting open problems are as follows:

1. The lower bounds for computation of almost-sure and limit-sure winning sets for

concurrent games with Rabin and Streett objectives are NP-hard and coNP-hard,


Table 8.1: Strategy complexity of concurrent games with ω-regular objectives, where ΣPM

denotes the family of pure memoryless strategies, ΣM denotes the family of random-ized memoryless strategies, and ΣHI denotes the family of randomized history dependent,infinite-memory strategies.

Objectives Sure Almost-sure Limit-sure ε-optimal

Safety ΣPM ΣPM ΣPM ΣM

Reachability ΣPM ΣM ΣM ΣM

coBuchi ΣPM ΣM ΣM ΣM

Buchi ΣPM ΣM ΣHI ΣHI

Parity ΣPM ΣHI ΣHI ΣHI

Rabin ΣPM ΣHI ΣHI ΣHI

Streett ΣPF ΣHI ΣHI ΣHI

Muller ΣPF ΣHI ΣHI ΣHI

Table 8.2: Computational complexity of concurrent games with ω-regular objectives.

Objectives Sure Almost-sure Limit-sure Values

Safety PTIME PTIME PTIME PSPACE

Reachability PTIME PTIME PTIME PSPACE

coBuchi PTIME PTIME PTIME PSPACE

Buchi PTIME PTIME PTIME PSPACE

Parity NP ∩ coNP NP ∩ coNP NP ∩ coNP PSPACE

Rabin NP-compl. EXPTIME EXPTIME EXPSPACE

Streett coNP-compl. EXPTIME EXPTIME EXPSPACE

Muller PSPACE-compl. EXPTIME EXPTIME EXPSPACE


respectively (from NP-hardness and coNP-hardness from the special case of 2-player

games). The upper bounds are EXPTIME, and it is open to prove that the problems

are NP-complete and coNP-complete for concurrent games with Rabin and Streett

objectives, respectively.

2. As stated above lower bounds for quantitative analysis of concurrent games with Rabin

and Streett objectives are NP-hard and coNP-hard, respectively. The upper bounds

are EXPSPACE. It remains open to get NP and coNP algorithm for concurrent games

with Rabin and Streett objectives, respectively, or even EXPTIME algorithms.

199

Chapter 9

Secure Equilibria and Applications

In this chapter we will consider 2-player games with non-zero-sum objectives and

show its application in synthesis1. In 2-player non-zero-sum games, Nash equilibria capture

the options for rational behavior if each player attempts to maximize her payoff. In contrast

to classical game theory, we consider lexicographic objectives: first, each player tries to

maximize her own payoff, and then, the player tries to minimize the opponent’s payoff.

Such objectives arise naturally in the verification of systems with multiple components.

There, instead of proving that each component satisfies its specification no matter how

the other components behave, it sometimes suffices to prove that each component satisfies

its specification provided that the other components satisfy their specifications. We say

that a Nash equilibrium is secure if it is an equilibrium with respect to the lexicographic

objectives of both players. We prove that in graph games with Borel objectives there

may be several Nash equilibria, but there is always an unique maximal payoff profile of a

secure equilibrium. We show how this equilibrium can be computed in the case of ω-regular

winning conditions. We then study the problem of synthesis of two independent processes

each with its own specification, and show how the notion of secure equilibria generalizes

1This chapter contains results from [CHJ04, CH07]

CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS 200

the assume-guarantee style of reasoning in a game theoretic framework and leads to a more

appropriate formulation of the synthesis problem.

9.1 Non-zero-sum Games

We consider 2-player non-zero-sum games, i.e., non-strictly competitive games. A

possible behavior of the two players is captured by a strategy profile (σ, π), where σ is

a strategy of player 1, and π is a strategy of player 2. Classically, the behavior (σ, π) is

considered rational if the strategy profile is a Nash equilibrium [Jr50] —that is, if neither

player can increase her payoff by unilaterally changing her strategy. Formally, let vσ,π1 be

the payoff of player 1 if the strategies (σ, π) are played, and let vσ,π2 be the corresponding

payoff of player 2. Then (σ, π) is a Nash equilibrium if (1) vσ,π1 ≥ v

σ′,π1 for all player 1

strategies σ′, and (2) vσ,π2 ≥ v

σ,π′

2 for all player 2 strategies π′. Nash equilibria formalize a

notion of rationality which is strictly internal : each player cares about her own payoff but

does not in the least care (cooperatively or adversarially) about the other player’s payoff.

Choosing among Nash equilibria. A classical problem is that many games have multiple

Nash equilibria, and some of them may be preferable to others. For example, one might

partially order the equilibria by (σ, π) (σ′, π′) if both vσ,π1 ≥ v

σ′,π′

1 and vσ,π2 ≥ v

σ′,π′

2 . If a

unique maximal Nash equilibrium exists in this order, then it is preferable for both players.

However, maximal Nash equilibria may not be unique. In such cases external criteria, such

as the sum of the payoffs for both players, have been used to evaluate different rational

behaviors [Kre90, Owe95, vNM47]. These external criteria, which are based on a single

preference order on strategy profiles, are usually cooperative, in that they capture social

aspects of rational behavior. We define and study, instead, an adversarial external criterion

for rational behavior. Put simply, we assume that each player attempts to minimize the

other player’s payoff as long as, by doing so, she does not decrease her own payoff. This


yields two different preference orders on strategy profiles, one for each player. Among two

strategy profiles (σ, π) and (σ′, π′), player 1 prefers (σ, π), denoted (σ, π) 1 (σ′, π′), if

either vσ,π1 > v

σ′,π′

1 , or both vσ,π1 = v

σ′,π′

1 and vσ,π2 ≤ v

σ′,π′

2 . In other words, the preference

order 1 of player 1 is lexicographic: the primary goal of player 1 is to maximize her own

payoff; the secondary goal is to minimize the opponent’s payoff. The preference order 2 of

player 2 is defined symmetrically. We refer to rational behaviors under these lexicographic

objectives as secure equilibria. (We do not know how to uniformly translate all games

with lexicographic preference orders to games with a single objective for each player, such

that the Nash equilibria of the translated games correspond to the secure equilibria of the

original games.)

Secure equilibria. The two orders 1 and 2 on strategy profiles, which express the

preferences of the two players, induce the following refinement of the notion of Nash equi-

librium: a strategy profile (σ, π) is a secure equilibrium if (1) (vσ,π1 , v

σ,π2 ) 1 (vσ′,π

1 , vσ′,π2 ) for

all player 1 strategies σ′, and (2) (vσ,π1 , v

σ,π2 ) 2 (vσ,π′

1 , vσ,π′

2 ) for all player 2 strategies π′.

Note that every secure equilibrium is a Nash equilibrium, but a Nash equilibrium need not

be secure. The name “secure” equilibrium derives from the following equivalent character-

ization. We say that a strategy profile (σ, π) is secure if any rational deviation of player 2

—i.e., a deviation that does not decrease her payoff— will not decrease the payoff of player 1,

and symmetrically, any rational deviation of player 1 will not decrease the payoff of player 2.

Formally, a strategy profile (σ, π) is secure if for all player 2 strategies π′, if vσ,π′

2 ≥ vσ,π2 then

vσ,π′

1 ≥ vσ,π1 , and for all player 1 strategies σ′, if v

σ′,π1 ≥ v

σ,π1 then v

σ′,π2 ≥ v

σ,π2 . The secure

profile (σ, π) can thus be interpreted as a contract between the two players which enforces

cooperation: any unilateral selfish deviation by one player cannot put the other player at a

disadvantage if she follows the contract. It is not difficult to show that a strategy profile is

a secure equilibrium iff it is both a secure profile and a Nash equilibrium. Thus, the secure

equilibria are those Nash equilibria which represent enforceable contracts between the two


players.

Motivation: verification of component-based systems. The motivation for our def-

initions comes from verification. There, one would like to prove that a component of a

system (player 1) can satisfy a specification no matter how the environment (player 2)

behaves [AHK02]. Classically, this is modeled as a strictly competitive (zero-sum) game,

where the environment’s objective is the complement of the component’s objective. How-

ever, the zero-sum model is often overly conservative, as the environment itself typically

consists of components, each with its own specification (i.e., objective). Moreover, the in-

dividual component specifications are usually not complementary; a common example is

that each component must maintain a local invariant. So a more appropriate approach is

to prove that player 1 can meet her objective no matter how player 2 behaves as long as

player 2 does not sabotage her own objective. In other words, classical correctness proofs

of a component assume absolute worst-case behavior of the environment, while it would

suffice to assume only relative worst-case behavior of the environment —namely, relative

to the assumption that the environment itself is correct (i.e., meets its specification). Such

relative worst-case reasoning, called assume-guarantee reasoning [AL95, AH99, NAT03], so

far has not been studied in the natural setting offered by game theory.

Existence and uniqueness of maximal secure equilibria. We will see that in general

games, such as matrix games, there may be multiple secure equilibrium payoff profiles, even

several incomparable maximal ones. We show that for 2-player games with Borel objectives,

which may have multiple maximal Nash equilibria, there always exists a unique maximal

secure equilibrium payoff profile. In other words, in graph games with Borel objectives

there is a compelling notion of rational behavior for each player, which is (1) a classical

Nash equilibrium, (2) an enforceable contract (“secure”), and (3) a guarantee of maximal

payoff for each player among all behaviors that achieve (1) and (2).


R2, R1

s1 s0 s2

R2s3 s4

Figure 9.1: A graph game with reachability objectives.

Examples. Consider the game graph shown in Fig. 9.1. Player 1 chooses the successor

node at square nodes and her objective is to reach the target s4. Player 2 chooses the

successor node at diamond nodes and her objective is to reach s3 or s4, also a reachability

objective. There are two player 1 strategies: the strategy σ1 chooses the move s0 → s1,

and σ2 chooses s0 → s2. There are also two player 2 strategies: the strategy π1 chooses

s1 → s3, and π2 chooses s1 → s4. The strategy profile (σ1, π1) leads the game into s3 and

therefore gives the payoff profile (0,1), indicating that player 1 loses and player 2 wins (i.e.,

only player 2 reaches her target). The strategy profiles (σ1, π2), (σ2, π1), and (σ2, π2) give

the payoffs (1,1), (0,0), and (0,0), respectively. All four strategy profiles are Nash equilibria.

For example, in (σ1, π1) player 1 does not have an incentive to switch to strategy σ2 (which

would still give her payoff 0), and neither does player 2 have an incentive to switch to π2

(she is already getting payoff 1). However, the strategy profile (σ1, π1) is not a secure

equilibrium, because player 2 can lower player 1’s payoff (from 1 to 0) without changing her

own payoff by switching to strategy σ2. Similarly, the strategy profile (σ1, π2) is not secure,

because player 1 can lower player 2’s payoff without changing her own payoff by switching

to σ2. So if both players, in addition to maximizing their own payoff, also attempt to

minimize the opponents payoff, then the resulting payoff profile is unique, namely, (0,0).

In other words, in this game, the only rational behavior for both players is to deny each

other’s objectives.


s3

s1 s0

B1

B2 s4 s2

Figure 9.2: A graph game with Buchi objectives.

This is not always the case: sometimes it is beneficial for both players to cooperate

to achieve their own objectives, with the result that both players win. Consider the game

graph shown in Fig. 9.2. Both players have Buchi objectives: player 1 (square) wants to

visit s0 infinitely often, and player 2 (diamond) wants to visit s4 infinitely often. If player 1

always chooses s1 → s0 and player 2 always chooses s2 → s4, then both players win. This

Nash equilibrium is also secure: if player 1 deviates by choosing s2 → s0, then player 2 can

“retaliate” by choosing s0 → s3; similarly, if player 2 deviates by choosing s1 → s2, then

player 2 can retaliate by s2 → s3. It follows that for purely selfish motives (and not some

social reason), both players have an incentive to cooperate to achieve the maximal secure

equilibrium payoff (1,1).

Outline and results. We first define the notion of secure equilibrium and give several

interpretations through alternative definitions. We then prove the existence and uniqueness

of maximal secure equilibria in graph games with Borel objectives. The proof is based on

the following classification of strategies. A player 1 strategy is called strongly winning if

it ensures that player 1 wins and player 2 loses (i.e., the outcome of the game satisfies

ϕ1 ∧ ¬ϕ2). A player 1 strategy is a retaliating strategy if it ensures that if player 2 wins,

then player 1 wins (i.e., the outcome satisfies ϕ2 → ϕ1). In other words, a retaliating

strategy for player 1 ensures that if player 2 causes player 1 to lose, then player 2 will


lose too. If both players follow retaliating strategies (σ, π), they may both win —in this

case, we say that (σ, π) is a winning pair of retaliating strategies— or they may both lose.

We show that at every node of a graph game with Borel objectives, either one of the two

players has a strongly winning strategy, or there is a pair of retaliating strategies. Based

on this insight, we give an algorithm for computing the secure equilibria in graph games

in the case that both players’ objectives are ω-regular. We then consider the problem of

synthesis of two independent processes each with its own specification and show that secure

equilibria generalizes the assume-guarantee style of reasoning in game theoretic framework

and present an appropriate formulation for the synthesis problem.

9.2 Secure Equilibria

In a secure game the objective of player 1 is to maximize her own payoff and then

minimize the payoff of player 2. Similarly, player 2 maximizes her own payoff and then

minimizes the payoff of player 1. We want to determine the best payoff that each player can

ensure when both players play according to these preferences. We formalize this as follows.

A strategy profile (σ, π) is a pair of strategies, where σ is a player 1 strategy and π is a

player 2 strategy. The strategy profile (σ, π) gives rise to a payoff profile (vσ,π1 , v

σ,π2 ), where

vσ,π1 is the payoff of player 1 if the two players follow the strategies σ and π respectively,

and vσ,π2 is the corresponding payoff of player 2. We define the player 1 preference order 1

and the player 2 preference order 2 on payoff profiles lexicographically:

(v1, v2) ≺1 (v′1, v′2) iff (v1 < v′1) ∨ (v1 = v′1 ∧ v2 > v′2),

that is, player 1 prefers a payoff profile which gives her greater payoff, and if two payoff

profiles match in the first component, then she prefers the payoff profile in which player 2’s

payoff is minimized. Symmetrically,

(v1, v2) ≺2 (v′1, v′2) iff (v2 < v′2) ∨ (v2 = v′2 ∧ v1 > v′1).


Given two payoff profiles (v1, v2) and (v′1, v′2), we write (v1, v2) = (v′1, v

′2) iff v1 = v′1 and

v2 = v′2, and (v1, v2) 1 (v′1, v′2) iff either (v1, v2) ≺1 (v′1, v

′2) or (v1, v2) = (v′1, v

′2). We

define 2 analogously.

Definition 17 (Secure strategy profiles) A strategy profile (σ, π) is secure if the fol-

lowing two conditions hold:

∀π′. (vσ,π′

1 < vσ,π1 ) → (vσ,π′

2 < vσ,π2 )

∀σ′. (vσ′,π2 < v

σ,π2 ) → (vσ′,π

1 < vσ,π1 )

A secure strategy for player 1 ensures that if player 2 tries to decrease player 1’s payoff,

then player 2’s payoff decreases as well, and vice versa.

Definition 18 (Secure equilibria) A strategy profile (σ, π) is a Nash equilibrium if

(1) vσ,π1 ≥ v

σ′,π1 for all player 1 strategies σ′, and (2) v

σ,π2 ≥ v

σ,π′

2 for all player 2 strate-

gies π′. A strategy profile is a secure equilibrium if it is both a Nash equilibrium and secure.

Proposition 13 (Equivalent characterization) The strategy profile (σ, π) is a secure

equilibrium iff the following two conditions hold:

∀π′. (vσ,π′

1 , vσ,π′

2 ) 2 (vσ,π1 , v

σ,π2 )

∀σ′. (vσ′,π1 , v

σ′,π2 ) 1 (vσ,π

1 , vσ,π2 ).

Proof. Consider a strategy profile (σ, π) which is a Nash equilibrium and secure. Since

(σ, π) is a Nash equilibrium, for all player 2 strategies π′, we have vσ,π′

2 ≤ vσ,π2 . Since (σ, π)

is secure, for all π′, we have (vσ,π′

1 < vσ,π1 ) → (vσ,π′

2 < vσ,π2 ). It follows that for every player 2

strategy π′, the following condition holds:

(vσ,π′

2 = vσ,π2 ∧ v

σ,π1 ≤ v

σ,π′

1 ) ∨ (vσ,π′

2 < vσ,π2 ).


Hence, for all π′, we have (vσ,π′

1 , vσ,π′

2 ) 2 (vσ,π1 , v

σ,π2 ). The argument for the other case is

symmetric. Thus neither player 1 nor player 2 has any incentive to switch from the strategy

profile (σ, π) in order to increase the payoff profile according to their respective payoff profile

ordering.

Conversely, an equilibrium strategy profile (σ, π) with respect to the preference

orders 1 and 2 is both a Nash equilibrium and a secure strategy profile.

Example 7 (Matrix games) A secure equilibrium need not exist in a matrix game. We

give an example of a matrix game where no Nash equilibrium is secure. Consider the game

M1 below, where the row player can choose row 1 or row 2 (denoted r1 and r2, respectively),

and the column player chooses between the two columns (denoted c1 and c2). The first

component of the payoff is the row player payoff, and the second component is the column

player payoff.

M1 =

(3, 3) (1, 3)

(3, 1) (2, 2)

In this game the strategy profile (r1, c1) is the only Nash equilibrium. But (r1, c1) is not a

secure strategy profile, because if the row player plays r1, then the column player playing c2

can still get payoff 3 and decrease the row player’s payoff to 1.

In the game M2 below, there are two Nash equilibria, namely, (r1, c2) and (r2, c1),

and the strategy profile (r2, c1) is a secure strategy profile as well. Hence the strategy profile

(r2, c1) is a secure equilibrium. However, the strategy profile (r1, c2) is not secure.

M2 =

(0, 0) (1, 0)

(12 , 1

2 ) (12 , 1

2)

Multiple secure equilibria may exist, as in the case, for example, in a matrix game where

all entries of the matrix are the same. We now present an example of a matrix game with

multiple secure equilibria with different payoff profiles. Consider the following matrix game

M3. The strategy profiles (r1, c1) and (r2, c2) are both secure equilibria. The former has the


payoff profile (2, 1), and the latter, the payoff profile (1, 2). These two payoff profiles are

incomparable: player 1 prefers the former, player 2 the latter. Hence, in this case, there is

not a unique maximal secure payoff profile.

M3 =

(2, 1) (0, 0)

(0, 0) (1, 2)

9.3 2-Player Non-Zero-Sum Games on Graphs

We consider 2-player infinite path-forming games played on graphs. We restrict

our attention to turn-based games and pure (i.e., non-randomized) strategies. In these

games, the class of pure strategies suffices for determinacy [Mar75], and, as we shall see,

for the existence of equilibria (both Nash and secure equilibria). Hence in this chapter we

consider pure strategies only. Given a state s ∈ S, a strategy σ of player 1, and a strategy

π of player 2, there is a unique play ωσ,π(s) = 〈s0, s1, s2, . . .〉, which starts from s and for

all i ≥ 0 we have (a) if si ∈ S1, then si+1 = σ(s0, s1, . . . , si); and (b) if si ∈ S2, then

si+1 = π(s0, s1, . . . , si).

We consider non-zero-sum games on graphs. For our purposes, a graph game

(G, s, ϕ1, ϕ2) consists of a game graph G, say with state space S, together with a start state

s ∈ S and two Borel objectives ϕ1, ϕ2 ⊆ Sω. The game starts at state s, player 1 pursues the

objective ϕ1, and player 2 pursues the objective ϕ2 (in general, ϕ2 is not the complement

of ϕ1). Player i ∈ 1, 2 gets payoff 1 if the outcome of the game is a member of ϕi, and

she gets payoff 0 otherwise. In the following, we fix the game graph G and the objectives

ϕ1 and ϕ2, but we vary the start state s of the game. Thus we parametrize the payoffs

by s: given strategies σ and π for the two players, we write vσ,πi (s) = 1 if ωσ,π(s) ∈ ϕi, and

vσ,πi (s) = 0 otherwise, for i ∈ 1, 2. Similarly, we sometimes refer to Nash equilibria and

secure strategy profiles of the graph game (G, s, ϕ1, ϕ2) as equilibria and secure profiles at

the state s.


In the following subsection, we investigate the existence and structure of secure

equilibria for the general class of graph games with Borel objectives. In the subsequent

subsection, we give a characterization of secure equilibria which can be used to compute

secure equilibria in the special case of ω-regular objectives.

9.3.1 Unique maximal secure equilibria

Consider a game graph G with state space S, and Borel objectives ϕ1 and ϕ2 for

the two players.

Definition 19 (Maximal secure equilibria) For v,w ∈ 0, 1, we write SE vw ⊆ S to

denote the set of states s such that a secure equilibrium with the payoff profile (v,w) exists

in the graph game (G, s, ϕ1, ϕ2); that is, s ∈ SE vw iff there is a secure equilibrium (σ, π)

at s such that (vσ,π1 (s), vσ,π

2 (s)) = (v,w). Similarly, MSvw ⊆ SE vw denotes the set of

states s such that the payoff profile (v,w) is a maximal secure equilibrium payoff profile

at s; that is, s ∈ MSvw iff (1) s ∈ SE vw and (2) for all v′, w′ ∈ 0, 1, if s ∈ SE v′w′, then

(v′, w′) 1 (v,w) and (v′, w′) 2 (v,w).

We now define the notions of strongly winning and retaliating strategies, which capture

the essence of secure equilibria. A strategy for player 1 is strongly winning if it ensures

that the objective of player 1 is satisfied and the objective of player 2 is not. A retaliating

strategy for player 1 ensures that for every strategy of player 2, if the objective of player 2

is satisfied, then the objective of player 1 is satisfied as well. We will show that every secure

equilibrium either contains a strongly winning strategy for one of the players, or it consists

of a pair of retaliating strategies.

Definition 20 (Strongly winning strategies) A strategy σ is strongly winning for

player 1 from a state s if she can ensure the payoff profile (1, 0) in the graph game

(G, s, ϕ1, ϕ2) by playing the strategy σ. Formally, σ is strongly winning for player 1 from s


if for all player 2 strategies π, we have ωσ,π(s) ∈ (ϕ1∧¬ϕ2). The strongly winning strategies

for player 2 are defined symmetrically.

Definition 21 (Retaliating strategies) A strategy σ is a retaliating strategy for player 1

from a state s if for all player 2 strategies π, we have ωσ,π(s) ∈ (ϕ2 → ϕ1). Similarly, a

strategy π is a retaliating strategy for player 2 from s if for all player 1 strategies σ, we

have ωσ,π(s) ∈ (ϕ1 → ϕ2). We write Re1(s) and Re2(s) to denote the sets of retaliating

strategies for player 1 and player 2, respectively, from s. A strategy profile (σ, π) is a

retaliation strategy profile at a state s if both σ and π are retaliating strategies from s. The

retaliation strategy profile (σ, π) is winning at s if ωσ,π(s) ∈ (ϕ1 ∧ ϕ2). A strategy σ is a

winning retaliating strategy for player 1 at state s if there is a strategy π for player 2 such

that (σ, π) is a winning retaliation strategy profile at s.

Example 8 (Buchi-Buchi game) Recall the graph game shown in Fig. 9.2. Consider the

memoryless strategies of player 2 at state s0. If player 2 chooses s0 → s3, then player 2

does not satisfy her Buchi objective. If player 2 chooses s0 → s2, then at state s2 player 1

chooses s2 → s0, and hence player 1’s objective is satisfied, but player 2’s objective is not

satisfied. Thus, no memoryless strategy for player 2 can be a winning retaliating strategy

at s0.

Now consider the strategy πg for player 2 which chooses s0 → s2 if between the

last two consecutive visits to s0 the state s4 was visited, and otherwise it chooses s0 → s3.

Given this strategy, for every strategy of player 1 that satisfies player 1’s objective, player 2’s

objective is also satisfied. Let σg be the player 1 strategy that chooses s2 → s4 if between the

last two consecutive visits to s2 the state s0 was visited, and otherwise chooses s2 → s3. The

strategy profile (σg, πg) consists of a pair of winning retaliating strategies, as it satisfies the

Buchi objectives of both players. If instead, player 2 always chooses s0 → s3, and player 1

always chooses s2 → s3, we obtain a memoryless retaliation strategy profile, which is not


winning for either player: it is a Nash equilibrium at state s0 with the payoff profile (0, 0).

Finally, suppose that at s0 player 2 always chooses s2, and at s2 player 1 always chooses s0.

This strategy profile is again a Nash equilibrium, with the payoff profile (0, 1) at s0, but not

a retaliation strategy profile. This shows that at state s0 the Nash equilibrium payoff profiles

(0, 1), (0, 0), and (1, 1) are possible, but only (0, 0) and (1, 1) are secure.

Given a game graph G with state space S, and a set ϕ ⊆ Sω of infinite paths, we define the

sets of states from which player 1 or player 2, respectively, can win a zero-sum game with

objective ϕ, as follows:

〈〈1〉〉G (ϕ) = s ∈ S | ∃σ ∈ Σ. ∀π ∈ Π. ωσ,π(s) ∈ ϕ

〈〈2〉〉G (ϕ) = s ∈ S | ∃π ∈ Π. ∀σ ∈ Σ. ωσ,π(s) ∈ ϕ

The set of states from which the two players can cooperate to satisfy the objective ϕ is

〈〈1, 2〉〉G (ϕ) = s ∈ S | ∃σ ∈ Σ. ∃π ∈ Π. ωσ,π(s) ∈ ϕ.

We omit the subscript G when the game graph is clear from the context. Let s be a state

in 〈〈1, 2〉〉(ϕ), and let (σ, π) be a strategy profile such that ωσ,π(s) ∈ ϕ. Then we say that

(σ, π) is a cooperative strategy profile at s.

Definition 22 (Characterization of states) For the given game graph G and Borel ob-

jectives ϕ1, ϕ2, we define the following four state sets in terms of strongly winning and

retaliating strategies.

• The sets of states where player 1 or player 2, respectively, has a strongly winning

strategy:

W10 = 〈〈1〉〉G (ϕ1 ∧ ¬ϕ2)

W01 = 〈〈2〉〉G (ϕ2 ∧ ¬ϕ1)


• The set of states where both players have retaliating strategies, and there exists a

retaliation strategy profile whose strategies satisfy the objectives of both players:

W11 = s ∈ S | ∃σ ∈ Re1(s). ∃π ∈ Re2(s). ωσ,π(s) ∈ (ϕ1 ∧ ϕ2)

• The set of states where both players have retaliating strategies and for every retalia-

tion strategy profile, neither the objective of player 1 nor the objective of player 2 is

satisfied:

W00 = s ∈ S | Re1(s) 6= ∅ and Re2(s) 6= ∅ and

∀σ ∈ Re1(s). ∀π ∈ Re2(s). ωσ,π(s) ∈ (¬ϕ1 ∧ ¬ϕ2)

We first show that the four sets W10, W01, W11, and W00 form a partition of the state

space. In the zero-sum case, where ϕ2 = ¬ϕ1, the sets W10 and W01 specify the winning

states for players 1 and 2, respectively; furthermore, W11 = ∅ by definition, and W00 = ∅ by

determinacy. We also show that for all v,w ∈ 0, 1, we have MSvw = Wvw. It follows that

for 2-player graph games (1) secure equilibria always exist, and moreover, (2) there is always

a unique maximal secure equilibrium payoff profile. (Example 8 showed that there can be

multiple secure equilibria with different payoff profiles). This result fully characterizes each

state of a 2-player non-zero-sum graph game with Borel objectives by a maximal secure

equilibria profile, just like the determinacy result fully characterizes the zero-sum case. The

proof proceeds in several steps.

Lemma 54 W10 = s ∈ S | Re2(s) = ∅ and W01 = s ∈ S | Re1(s) = ∅.

Proof. First, W10 ⊆ s ∈ S | Re2(s) = ∅, because a strongly winning strategy of player 1

—i.e., a strategy to satisfy ϕ1 ∧ ¬ϕ2 against every strategy of player 2— is a witness to

exhibit that there is no retaliating strategy for player 2. Second, it follows from Borel

determinacy that from each state s in S \ W10 there is a strategy π of player 2 to satisfy

¬ϕ1 ∨ ϕ2 against every strategy of player 1. The strategy π is a retaliating strategy for


player 2. Hence S \W10 ⊆ s ∈ S | Re2(s) 6= ∅, and therefore W10 = s ∈ S | Re2(s) = ∅.

The proof that W01 = s ∈ S | Re1(s) = ∅ is symmetric.

Lemma 55 Consider the following two sets:

T1 = s ∈ S | ∀σ ∈ Re1(s). ∀π ∈ Re2(s). ωσ,π(s) ∈ (¬ϕ1 ∧ ¬ϕ2)

T2 = s ∈ S | ∀σ ∈ Re1(s). ∀π ∈ Re2(s). ωσ,π(s) ∈ (¬ϕ1 ∨ ¬ϕ2)

Then T1 = T2.

Proof. The inclusion T1 ⊆ T2 follows from the fact that (¬ϕ1 ∧ ¬ϕ2) → (¬ϕ1 ∨ ¬ϕ2). We

show that T2 ⊆ T1. By the definition of retaliating strategies, if σ is a retaliating strategy

of player 1, then for all strategies π of player 2, we have ωσ,π(s) ∈ (ϕ2 → ϕ1), and thus

ωσ,π(s) ∈ (¬ϕ1 → ¬ϕ2). Symmetrically, if π is a retaliating strategy of player 2, then for

all strategies σ of player 1, we have ωσ,π(s) ∈ (¬ϕ2 → ¬ϕ1). Hence, given a retaliation

strategy profile (σ, π), we have ωσ,π(s) ∈ (¬ϕ1 ∨¬ϕ2) iff ωσ,π(s) ∈ (¬ϕ1 ∧¬ϕ2). The lemma

follows.

Proposition 14 (State space partition) For all 2-player graph games with Borel objec-

tives, the four sets W10, W01, W11, and W00 form a partition of the state space.

Proof. It follows from Lemma 54 that

S \ (W10 ∪ W01) = s ∈ S | Re2(s) 6= ∅ ∧ Re1(s) 6= ∅.

It also follows that the sets W10, W01, W11, and W00 are disjoint. By definition, we have

W00 ⊆ s ∈ S | Re1(s) 6= ∅∧Re2(s) 6= ∅ ⊆ S \ (W10 ∪W01). Consider T1 and T2 as defined

in Lemma 55. We have W00 = T1, and by Lemma 54, we have T2 ∪W11 = S \ (W10 ∪W01).

It also follows that T2 ∩ W11 = ∅, and hence T2 =(S \ (W10 ∪ W01)

)\ W11. Therefore by

Lemma 55,

T2 = T1 = W00 =(S \ (W10 ∪ W01)

)\ W11.


The proposition follows.

Lemma 56 The following equalities hold:

SE 00 ∩ SE10 = ∅SE 01 ∩ SE10 = ∅SE 00 ∩ SE01 = ∅.

Proof. Consider a state s ∈ SE10 and a secure equilibrium (σ, π) at s. Since the strategy

profile is secure and player 2 receives the least possible payoff, it follows that for all player 2

strategies, the payoff for player 1 cannot decrease. Hence for all player 2 strategies π′, we

have ωσ,π′(s) ∈ ϕ1. So there is no Nash equilibrium at state s which assigns payoff 0 to

player 1. Hence we have SE 10 ∩ SE 01 = ∅ and SE 10 ∩ SE 00 = ∅. The argument to show

that SE 01 ∩ SE 00 = ∅ is similar.


SE 11 ∩ SE01 = ∅SE 11 ∩ SE10 = ∅.

Proof. Consider a state s ∈ SE11 and a secure equilibrium (σ, π) at s. Since the strategy

profile is secure, it ensures that for all player 2 strategies π′, if ωσ,π′(s) ∈ ¬ϕ1, then ωσ,π′ ∈

¬ϕ2. Hence s 6∈ SE 01. Thus SE 11∩SE 01 = ∅. The proof that SE 11∩SE10 = ∅ is analogous.


MS 00 ∩ MS01 = ∅MS 00 ∩ MS10 = ∅MS 01 ∩ MS10 = ∅MS 11 ∩ MS00 = ∅.

Proof. The first three equalities follow from Lemmas 56 and 57. The last equality follows

from the facts that (0, 0) 1 (1, 1) and (0, 0) 2 (1, 1). So if s ∈ MS11, then (0, 0) cannot

be a maximal secure payoff profile at s.


Lemma 59 W10 = MS 10 and W01 = MS01.

Proof. Consider a state s ∈ MS10 and a secure equilibrium (σ, π) at s. Since player 2

receives the least possible payoff and (σ, π) is a secure strategy profile, it follows that for all

strategies π′ of player 2, we have ωσ,π′(s) ∈ ϕ1. Since (σ, π) is a Nash equilibrium, for all

strategies π′ of player 2, we have ωσ,π′(s) ∈ ¬ϕ2. Thus MS10 ⊆ W10. Now consider a state

s ∈ W10, and let σ be a strongly winning strategy of player 1 at s; that is, for all strategies

π of player 2, we have ωσ,π(s) ∈ (ϕ1 ∧ ¬ϕ2). For all strategies π of player 2, the strategy

profile (σ, π) is a secure equilibrium. Hence s ∈ SE10. Since (1, 0) is the greatest payoff

profile in the preference order for player 1, we have s ∈ MS10. Therefore W10 = MS10.

Symmetrically, W01 = MS01.

Lemma 60 W11 = MS 11.

Proof. Consider a state s ∈ MS 11, and let (σ, π) be a secure equilibrium at s. We prove

that σ ∈ Re1(s) and π ∈ Re2(s). Since (σ, π) is a secure strategy profile, for all strategies

π′ of player 2, if ωσ,π′(s) ∈ ¬ϕ1, then ωσ,π′(s) ∈ ¬ϕ2. In other words, for all strategies π′

of player 2, we have ωσ,π′(s) ∈ (ϕ2 → ϕ1). Hence σ ∈ Re1(s). Symmetrically, π ∈ Re2(s).

Thus MS11 ⊆ W11. Consider a state s ∈ W11, and let σ ∈ Re1(s) and π ∈ Re2(s) such that

ωσ,π(s) ∈ (ϕ1 ∧ϕ2). A retaliation strategy profile is, by definition, a secure strategy profile.

Since the strategy profile (σ, π) assigns the greatest possible payoff to each player, it is a

Nash equilibrium. Therefore W11 ⊆ SE 11 ⊆ MS11.

Lemma 61 W00 = MS 00.

Proof. It follows from Lemmas 56 and 58 that MS00 = SE 00 \ SE 11 = SE 00 \ MS 11. We

will use this fact to prove that W00 = MS00. First, consider a state s ∈ MS00. Then

s 6∈ (MS 11 ∪MS10 ∪MS11), which implies that s 6∈ (W11 ∪W10 ∪W01). By Proposition 14,

it follows that s ∈ W00. Thus MS00 ⊆ W00.


Second, consider a state s ∈ W00. We claim that there is a strategy σ of player 1

such that for all strategies π′ of player 2, we have ωσ,π′(s) ∈ ¬ϕ2. Assume by the way

of contradiction that this is not the case. Then, by Borel determinacy there is a player 2

strategy π′′ such that for all player 1 strategies σ′, we have ωσ′,π′′(s) ∈ ϕ2. It follows that

either π′′ is a strongly winning strategy for player 2, or a retaliating strategy such that

player 2 receives payoff 1. Hence s 6∈ W00, which is a contradiction. Thus there is a player 1

strategy σ such that for all player 2 strategies π′, we have ωσ,π′(s) ∈ ¬ϕ2. Similarly, there

is a player 2 strategy π such that for all player 1 strategies σ′, we have ωσ′,π(s) ∈ ¬ϕ1.

We claim that (σ, π) is a secure equilibrium. By the properties of σ, for every π′ we have

ωσ,π′(s) ∈ ¬ϕ2. A similar argument holds for π as well. It follows that (σ, π) is a Nash

equilibrium. The strategy profile (σ, π) has the payoff profile (0, 0), which assigns the least

possible payoff to each player. Hence it is a secure strategy profile. Therefore s ∈ SE 00.

Also, s ∈ W00 implies that s 6∈ W11. Since W11 = MS 11, we have s ∈ SE00 \ MS11. Thus

W00 ⊆ MS00.

Theorem 46 (Unique maximal secure equilibria) At every state of a 2-player graph

game with Borel objectives, there exists a unique maximal secure equilibrium payoff profile.

Proof. From Lemmas 59, 60, and 61, it follows that for all i, j ∈ 0, 1, we have MS ij = Wij .

Using Proposition 14, the theorem follows.

9.3.2 Algorithmic characterization of secure equilibria

We now give an alternative characterization of the state sets W00, W01, W10,

and W11. The new characterization is useful to derive computational complexity results

for computing the four sets when player 1 and player 2 have ω-regular objectives. The

characterization itself, however, applies to all tail (prefix independent) objectives.

In this subsection we only consider tail objectives ϕ1 and ϕ2, for player 1 and player 1,


respectively. It follows from the definitions that W10 = 〈〈1〉〉(ϕ1 ∧¬ϕ2) and W01 = 〈〈2〉〉(ϕ2 ∧

¬ϕ1). Define A = S \ (W10 ∪W01), the set of “ambiguous” states from which neither player

has a strongly winning strategy. Let Wi = 〈〈i〉〉(ϕi), for i ∈ 1, 2, be the winning sets of

the two players, and let U1 = W1 \W10 and U2 = W2 \W01 be the sets of “weakly winning”

states for players 1 and 2, respectively. Define U = U1 ∪ U2. Note that U ⊆ A.

Lemma 62 U ⊆ W11.

Proof. Let s ∈ U1. By the definition of U1, player 1 has a strategy σ from the state s

to satisfy the objective ϕ1, which is obviously a retaliating strategy, because ϕ1 implies

ϕ2 → ϕ1. Again by the definition of U1, we have s 6∈ W10. Hence, by the determinacy

of zero-sum games, player 2 has a strategy π to satisfy the objective ¬(ϕ1 ∧ ¬ϕ2), which

is a retaliating strategy, because ¬(ϕ1 ∧ ¬ϕ2) is equivalent to ϕ1 → ϕ2. Clearly, we have

ωσ,π(s) ∈ ϕ1 and ωσ,π(s) ∈ (ϕ1 → ϕ2), and hence ωσ,π(s) ∈ (ϕ1 ∧ ϕ2). The case of s ∈ U2

is symmetric.

Example 8 shows that in general U ( W11. Given a game graph G = ((S,E), (S1, S2)) and

a subset S′ ⊆ S of the states, we write G S′ to denote the subgraph induced by S′, that

is, G S′ = ((S′, E ∩ (S′ × S′)), (S1 ∩ S′, S2 ∩ S′)). The following lemma characterizes the

set W11.

Lemma 63 W11 = 〈〈1, 2〉〉GA (ϕ1 ∧ ϕ2).

Proof. Let s ∈ 〈〈1, 2〉〉GA (ϕ1∧ϕ2). The case s ∈ U is covered by Lemma 62; so let s ∈ A\U .

Let (σ, π) be a cooperative strategy profile at s, that is, ωσ,π(s) ∈ (ϕ1 ∧ ϕ2). Observe that

if t ∈ A \U , then t 6∈ 〈〈1〉〉G (ϕ1) and t 6∈ 〈〈2〉〉G (ϕ2). Hence, by the determinacy of zero-sum

games, from every state t ∈ A \ U , player 1 (resp. player 2) has a strategy σ (resp. π) to

satisfy the objective ¬ϕ2 (resp. ¬ϕ1) from state s. We define the pair (σ + σ, π + π) of

strategies from s as follows:


• When the play reaches a state t ∈ U , the players follow their winning retaliating

strategies from t. It follows from Lemma 62 that U ⊆ W11.

• If the play has not yet reached the set U , then player 1 uses the strategy σ and player 2

uses the strategy π. If, however, player 2 deviates from the strategy π, then player 1

switches to the strategy σ at the first state after the deviation, and symmetrically, as

soon as player 1 deviates from σ, then player 2 switches to the strategy π.

It is easy to observe that both strategies σ +σ and π +π are retaliating strategies, and that

ωσ+σ,π+π(s) ∈ (ϕ1 ∧ ϕ2), because ωσ+σ,π+π(s) = ωσ,π(s). Hence s ∈ W11.

Let s 6∈ 〈〈1, 2〉〉GA (ϕ1 ∧ ϕ2). Then s 6∈ W11, because for every strategy profile

(σ, π), either ωσ,π(s) ∈ ¬ϕ1 or ωσ,π(s) ∈ ¬ϕ2.

By definition, the two sets W10 and W01 can be computed by solving two zero-sum

games with conjunctive objectives. Lemma 63 shows that the set W11 can be computed

by solving a model-checking (i.e., 1-player) problem for a conjunctive objective. Finally, it

follows from Proposition 14 that the set W00 can be obtained by set operations. This is

summarized in the following theorem.

Theorem 47 (Algorithmic characterization of secure equilibria) Consider a game

graph G with Borel objectives ϕ1 and ϕ2 for the two players. The four sets W10, W01, W11,

and W00 can be computed as follows:

W10 = 〈〈1〉〉G (ϕ1 ∧ ¬ϕ2),W01 = 〈〈2〉〉G (ϕ2 ∧ ¬ϕ1),W11 = 〈〈1, 2〉〉GA (ϕ1 ∧ ϕ2),W00 = S \ (W10 ∪ W01 ∪ W11),

where A = S \ (W10 ∪ W01).

If the two objectives ϕ1 and ϕ2 are ω-regular, then we obtain the following corol-

lary.


Corollary 8 (Computational complexity) Let n be the size of the game graph G.

• If ϕ1 and ϕ2 are parity objectives specified by priority functions, then the decision

problem whether a given state lies in W10, or in W01, is coNP-complete; and if a

given state lies in W11, or in W00, can be decided in NP. The four sets W10, W01,

W11, and W00 can be computed in time O(nd+1 · d!

), where d is the maximal number

of priorities in the two priority functions.

• If the two objectives ϕ1 and ϕ2 are specified as LTL (linear temporal logic) formulas,

then deciding W10, W01, W11, and W00 is 2EXPTIME-complete. The four sets can

be computed in time O(n2ℓ × 22ℓ·log ℓ)

, where ℓ is the sum of the lengths of the two

formulas.

Proof. If the objectives ϕ1 and ϕ2 are parity objectives, and d is the maximal number

of priorities in the two priority functions, then the conjunctions ϕ1 ∧ ¬ϕ2, ϕ2 ∧ ¬ϕ1 and

ϕ1 ∧ ϕ2 can be expressed as Streett objectives [Tho97] with d pairs. The decision problem

for zero-sum games with Streett objectives is in co-NP [EJ88], the model-checking problem

for Streett objectives can be solved in polynomial time, and zero-sum games with Streett

objectives with d pairs can be solved in time O(nd+1 · d!) [PP06]. It follows that, for a

given state s, whether s ∈ W10 and whether s ∈ W01 can be decided in co-NP, and whether

s ∈ A for A = S \ (W01 ∪ W10) can be decided in NP. Given the set A, whether s ∈ W11

and whether s ∈ W00 can be decided in PTIME, by solving a model-checking problem with

Streett objectives. It follows from the results of [CHP07] that deciding the winner of a game

with conjunction of two parity objectives is coNP-hard; and hence the coNP-complete result

follows. The first part of the corollary follows.

Since the decision problem for zero-sum games with LTL objectives is 2EXPTIME-

complete [PR89], the 2EXPTIME lower bound is immediate. We obtain the matching upper

bound as follows. Let ℓ be the sum of the lengths of the two LTL formulas ϕ1 and ϕ2. LTL


formulas are closed under conjunction and negation, and hence ϕ1 ∧ ¬ϕ2 and ϕ2 ∧ ¬ϕ1

are LTL formulas of length ℓ + 2. An LTL formula of length ℓ can be converted into an

equivalent nondeterministic Buchi automaton of size 2ℓ [VW86], and the nondeterministic

Buchi automaton can be converted into an equivalent deterministic parity automaton of

size 22ℓ·log ℓ

with 2ℓ priorities [Saf88]. The problem then reduces to solving zero-sum parity

games obtained by the synchronous product of the game graph and the deterministic parity

automaton. Since zero-sum parity games can be solved in time O(nd) for game graphs of

size n and parity objectives with d priorities [Tho97], the upper bound follows.

9.4 Assume-guarantee Synthesis

In this we will study the synthesis of two independent processes and show how

secure equilibria is useful in such scenario. The classical synthesis problem for reactive

systems asks, given a proponent process A and an opponent process B, to refine A so

that the closed-loop system A||B satisfies a given specification Φ. The solution of this

problem requires the computation of a winning strategy for proponent A in a game against

opponent B. We define and study the co-synthesis problem, where the proponent A consists

itself of two independent processes, A = A1||A2, with specifications Φ1 and Φ2, and the goal

is to refine both A1 and A2 so that A1||A2||B satisfies Φ1∧Φ2. For example, if the opponent

B is a fair scheduler for the two processes A1 and A2, and Φi specifies the requirements of

mutual exclusion for Ai (e.g., starvation freedom), then the co-synthesis problem asks for

the automatic synthesis of a mutual-exclusion protocol.

We show that co-synthesis defined classically, with the processes A1 and A2 either

collaborating or competing, does not capture desirable solutions. Instead, the proper formu-

lation of co-synthesis is the one where process A1 competes with A2 but not at the price of

violating Φ1, and vice versa. We call this assume-guarantee synthesis and show that it can


be solved by computing secure-equilibrium strategies. In particular, from mutual-exclusion

requirements the assume-guarantee synthesis algorithm automatically computes Peterson’s

protocol.

We formally define the co-synthesis problem, using the automatic synthesis of a

mutual-exclusion protocol as a guiding example. More precisely, we wish to synthesize

two processes P1 and P2 so that the composite system P1||P2||R, where R is a scheduler

that arbitrarily but fairly interleaves the actions of P1 and P2, satisfies the requirements

of mutual exclusion and starvation freedom for each process. We show that traditional

zero-sum game-theoretic formulations, where P1 and P2 either collaborate against R, or

unconditionally compete, do not lead to acceptable solutions. We then show that for the

non-zero-sum game-theoretic formulation, where the two processes compete conditionally,

there exists an unique winning secure-equilibrium solution, which corresponds exactly to

Peterson’s mutual-exclusion protocol. In other words, Peterson’s protocol can be synthe-

sized automatically as the winning secure strategies of two players whose objectives are the

mutual-exclusion requirements. This is to our knowledge the first application of non-zero-

sum games in the synthesis of reactive processes. It is also, to our knowledge, the first

application of Nash equilibria —in particular, the special kind called “secure”— in system

design.

The new formulation of co-synthesis, with the two processes competing condition-

ally, is called assume-guarantee synthesis, because similar to assume-guarantee verification

(e.g., [AH99]), in attempting to satisfy her specification, each process makes the assumption

that the other process does not violate her own specification. The solution of the assume-

guarantee synthesis problem can be obtained by computing secure equilibria in 3-player

games, with the three players P1, P2, and R.


do

flag[1]:=true; turn:=2;

| while(flag[1]) nop;

| while(flag[2]) nop;

| while(turn=1) nop;

| while(turn=2) nop;

| while(flag[1] & turn=2) nop;




Cr1:=true;

fin_wait;

Cr1:=false;

flag[1]:=false;

wait[1]:=1;

while(wait[1]=1)

| nop;

| wait[1]:=0;

while(true)

do


| while(flag[1]) nop; (C1)

| while(flag[2]) nop; (C2)

| while(turn=1) nop; (C3)

| while(turn=2) nop; (C4)

| while(flag[1] & turn=2) nop; (C5)




Cr2:=true;

fin_wait;

Cr2:=false;

flag[2]:=false;

wait[2]:=1;

while(wait[2]=1)

| nop; (C9)

| wait[2]:=0; (C10)

while(true)

Figure 9.3: Mutual-exclusion protocol synthesis

9.4.1 Co-synthesis

In this section we define processes, refinement, schedulers, and specifications. We

consider the traditional co-operative [CE81] and strictly competitive [PR89, RW87] versions

of the co-synthesis problem; we refer to them as weak co-synthesis and classical co-synthesis,

respectively. We show the drawbacks of these formulations and then present a new formu-

lation of co-synthesis, namely, assume-guarantee synthesis.

Variables, valuations, and traces. Let X be a finite set of variables such that each variable

x ∈ X has a finite domain Dx. A valuation θ on X is a function θ : X → ⋃x∈X Dx that

assigns to each variable x ∈ X a value θ(x) ∈ Dx. We write Θ for the set of valuations on

X. A trace on X is an infinite sequence (θ0, θ1, θ2, . . .) ∈ Θω of valuations on X. Given a


valuation θ ∈ Θ and a subset Y ⊆ X of the variables, we denote by θ Y the restriction

of the valuation θ to the variables in Y . Similarly, for a trace τ = (θ0, θ1, θ2, . . .) on X, we

write τ Y = (θ0 Y, θ1 Y, θ2 Y, . . .) for the restriction of τ to the variables in Y . The

restriction operator is lifted to sets of valuations, and to sets of traces.

Processes and refinement. For i ∈ 1, 2, a process Pi = (Xi, δi) consists of a finite set Xi

of variables and a nondeterministic transition function δi : Θi → 2Θi \ ∅, where Θi is the

set of valuations on Xi. The transition function maps a present valuation to a nonempty

set of possible successor valuations. We write X = X1 ∪ X2 for the set of variables of

both processes; note that some variables may be shared by both processes. A refinement

of process Pi = (Xi, δi) is a process P ′i = (X ′

i, δ′i) such that (1) Xi ⊆ X ′

i, and (2) for all

valuations θ′ on X ′i, we have δ′i(θ

′) Xi ⊆ δi(θ′ Xi). In other words, the refined process

P ′i has possibly more variables than the original process Pi, and every possible update of

the variables in Xi by P ′i is a possible update by Pi. We write P ′

i Pi to denote that P ′i is

a refinement of Pi. Given two refinements P ′1 of P1 and P ′

2 of P2, we write X ′ = X ′1 ∪ X ′

2

for the set of variables of both refinements, and we denote the set of valuations on X ′ by

Θ′.

Schedulers. Given two processes P1 and P2, a scheduler R for P1 and P2 chooses at each

computation step whether it is process P1’s turn or process P2’s turn to update its variables.

Formally, the scheduler R is a function R : Θ∗ → 1, 2 that maps every finite sequence of

global valuations (representing the history of a computation) to i ∈ 1, 2, signaling that

process Pi is next to update its variables. The scheduler R is fair if it assigns turns to both

P1 and P2 infinitely often; i.e., for all traces (θ0, θ1, θ2, . . .) ∈ Θω, there exist infinitely many

j ≥ 0 and infinitely many k ≥ 0 such that R(θ0, . . . , θj) = 1 and R(θ0, . . . , θk) = 2. Given

two processes P1 = (X1, δ1) and P2 = (X2, δ2), a scheduler R for P1 and P2, and a start

valuation θ0 ∈ Θ, the set of possible traces is [[(P1 || P2 || R)(θ0)]] = (θ0, θ1, θ2, . . .) ∈ Θω |

∀j ≥ 0. R(θ0, . . . , θj) = i and θj+1 (X \ Xi) = θj (X \ Xi) and θj+1 Xi ∈ δi(θj Xi).


Note that during turns of one process Pi, the values of the private variables X \ Xi of the

other process remain unchanged.

Specifications. A specification Φi for processes Pi is a set of traces on X; that is, Φi ⊆

Θω. We consider only ω-regular specifications [Tho97]. We define boolean operations on

specifications using logical operators such as ∧ (conjunction) and → (implication).

Weak co-synthesis. In all formulations of the co-synthesis problem that we consider, the

input to the problem is given as follows: two processes P1 = (X1, δ1) and P2 = (X2, δ2), two

specifications Φ1 for process 1 and Φ2 for process 2, and a start valuation θ0 ∈ Θ. The weak

co-synthesis problem is defined as follows: do there exist two processes P ′1 = (X ′

1, δ′1) and

P ′2 = (X ′

2, δ′2), and a valuation θ′0 ∈ Θ′, such that (1) P ′

1 P1 and P ′2 P2 and θ′0 X = θ0,

and (2) for all fair schedulers R for P ′1 and P ′

2, we have [[(P ′1 || P ′

2 || R)(θ′0)]] X ⊆ (Φ1∧Φ2).

Example 9 (Mutual-exclusion protocol synthesis) Consider the two processes shown

in Fig. 9.3. Process P1 (on the left) places a request to enter its critical section by setting

flag[1]:=true, and the entry of P1 into the critical section is signaled by Cr1:=true; and

similarly for process P2 (on the right). The two variables flag[1] and flag[2] are boolean,

and in addition, both processes may use a shared variable turn that takes two values 1 and

2. There are 8 possible conditions C1–C8 for a process to guard the entry into its critical

section.2 The figure shows all 8×8 alternatives for the two processes; any refinement without

additional variables will choose a subset of these. Process P1 may stay in its critical section

for an arbitrary finite amount of time (indicated by fin wait), and then exit by setting

Cr1:=false; and similarly for process P2. The while loop with the two alternatives C9

and C10 expresses the fact that a process may wait arbitrarily long (possibly infinitely long)

before a subsequent request to enter its critical section.

We use the notations 2 and 3 to denote always (safety) and eventually (reacha-

2Since a guard may check any subset of the three 2-valued variables, there are 256 possible guards; butall except 8 can be discharged immediately as not useful.


bility) specifications, respectively. The specification for process P1 consists of two parts:

a safety part Φmutex1 = 2¬(Cr1 = true ∧ Cr2 = true) and a liveness part Φ

prog

1 =

2(flag[1] = true → 3(Cr1 = true)). The first part Φmutex1 specifies that both processes

are not in their critical sections simultaneously (mutual exclusion); the second part Φprog

1

specifies that if process P1 wishes to enter its critical section, then it will eventually enter

( starvation freedom). The specification Φ1 for process P1 is the conjunction of Φmutex1 and

Φprog

1 . The specification Φ2 for process P2 is symmetric.

The answer to the weak co-synthesis problem for Example 9 is “Yes.” A solution of the

weak co-synthesis formulation are two refinements P ′1 and P ′

2 of the two given processes P1

and P2, such that the composition of the two refinements satisfies the specifications Φ1 and

Φ2 for every fair scheduler. One possible solution is as follows: in P ′1, the alternatives C4

and C10 are chosen, and in P ′2, the alternatives C3 and C10 are chosen. This solution is not

satisfactory, because process P1’s starvation freedom depends on the fact that process P2

requests to enter its critical section infinitely often. If P2 were to make only a single request

to enter its critical section, then the progress part of Φ1 would be violated.

Classical co-synthesis. The classical co-synthesis problem is defined as follows: do there

exist two processes P ′1 = (X ′

1, δ′1) and P ′

2 = (X ′2, δ

′2), and a valuation θ′0 ∈ Θ′, such that

(1) P ′1 P1 and P ′

2 P2 and θ′0 X = θ0, and (2) for all fair schedulers R for P ′1 and P ′

2,

we have (a) [[(P ′1 || P2 || R)(θ′0)]] X ⊆ Φ1 and (b) [[(P1 || P ′

2 || R)(θ′0)]] X ⊆ Φ2.

The answer to the classical co-synthesis problem for Example 9 is “No.” We will

argue later (in Example 10) why this is the case.

Assume-guarantee synthesis. We now present a new formulation of the co-synthesis

problem. The main idea is derived from the notion of secure equilibria. We refer to this

new formulation as the assume-guarantee synthesis problem; it is defined as follows: do

there exist two refinements P ′1 = (X ′

1, δ′1) and P ′

2 = (X ′2, δ

′2), and a valuation θ′0 ∈ Θ′, such


do


while (flag[2] & turn=2) nop;

Cr1:=true;

fin_wait;

Cr1:=false;

flag[1]:=false;

wait[1]:=1;

while(wait[1]=1)

| nop;

| wait[1]:=0;

while(true)

do


while (flag[1] & turn=1) nop; (C8+C6)

Cr2:=true;

fin_wait;

Cr2:=false;

flag[2]:=false;

wait[2]:=1;

while(wait[2]=1)

| nop; (C9)

| wait[2]:=0; (C10)

while(true)

Figure 9.4: Peterson’s mutual-exclusion protocol

that (1) P ′1 P1 and P ′

2 P2 and θ′0 X = θ0, and (2) for all fair schedulers R for P ′1 and

P ′2, we have (a) [[(P ′

1 || P2 || R)(θ′0)]] X ⊆ (Φ2 → Φ1) and (b) [[(P1 || P ′2 || R)(θ′0)]] X ⊆

(Φ1 → Φ2) and (c) [[(P ′1 || P ′

2 || R)(θ′0)]] X ⊆ (Φ1 ∧ Φ2).

The answer to the assume-guarantee synthesis problem for Example 9 is “Yes.”

A solution P ′1 and P ′

2 is shown in Fig. 9.4. We will argue the correctness of this solution

later (in Example 11). The two refined processes P ′1 and P ′

2 present exactly Peterson’s

solution to the mutual-exclusion problem. In other words, Peterson’s protocol can be derived

automatically as an answer to the assume-guarantee synthesis problem for the requirements

of mutual exclusion and starvation freedom. The success of assume-guarantee synthesis for

the mutual-exclusion problem, together with the failure of the classical co-synthesis, suggests

that the classical formulation of co-synthesis is too strong.


9.4.2 Game Algorithms for Co-synthesis

We reduce the three formulations of the co-synthesis problem to problems about

games played on graphs with three players.

Game graphs. A 3-player game graph G = ((S,E), (S1, S2, S3)) consists of a directed graph

(S,E) with a finite set S of states and a set E ⊆ S×S of edges, and a partition (S1, S2, S3)

of the state space S into three sets. The states in Si are player-i states, for i ∈ 1, 2, 3,

and player i decides the successor at a state in Si. The notion of strategies and plays are

similar as in the case of 2-player games. We denote by σi a strategy for player i and Σi

the set of all strategies for player i, for i ∈ 1, 2, 3. Given a start state s ∈ S and three

strategies σi ∈ Σi, one for each of the three players i ∈ 1, 2, 3, there is an unique play,

denoted ωσ1,σ2,σ3(s) = (s0, s1, s2, . . .), such that s0 = s and for all k ≥ 0, if sk ∈ Si, then

σi(s0, s1, . . . , sk) = sk+1; this play is the outcome of the game starting at s given the three

strategies σ1, σ2, and σ3.

Winning. An objective Ψ is a set of plays; i.e., Ψ ⊆ Ω. We extend the notion of winning

states to three player games (using the notation is derived from [AHK02]. For an objective

Ψ, the set of winning states for player 1 in the game graph G is

〈〈1〉〉G(Ψ) = s ∈ S | ∃σ1 ∈ Σ1. ∀σ2 ∈ Σ2. ∀σ3 ∈ Σ3. ωσ1,σ2,σ3(s) ∈ Ψ;

a witness strategy σ1 for player 1 for the existential quantifier is referred to as a winning

strategy. The winning sets 〈〈2〉〉G(Ψ) and 〈〈3〉〉G(Ψ) for players 2 and 3 are defined analo-

gously. The set of winning states for the team consisting of player 1 and player 2, playing

against player 3, is

〈〈1, 2〉〉G(Ψ) = s ∈ S | ∃σ1 ∈ Σ1. ∃σ2 ∈ Σ2. ∀σ3 ∈ Σ3. ωs,σ1,σ2,σ3(s) ∈ Ψ.

The winning sets 〈〈I〉〉G(Ψ) for other teams I ⊆ 1, 2, 3 are defined similarly. The following

determinacy result follows from [GH82].


Theorem 48 (Finite-memory determinacy [GH82]) Let Ψ be an ω-regular objective,

let G be a 3-player game graph, and let I ⊆ 1, 2, 3 be a set of the players. Let J =

1, 2, 3\I. Then (1) 〈〈I〉〉G(Ψ) = S\〈〈J〉〉G(¬Ψ), and (2) there exist finite-memory strategies

for the players in I such that against all strategies for the players in J , for all states in

s ∈ 〈〈I〉〉G(Ψ), the play starting at s given the strategies lies in Ψ.

Game solutions to weak and classical co-synthesis. Given two processes P1 = (X1, δ1)

and P2 = (X2, δ2), we define the 3-player game graph G = ((S,E), (S1, S2, S3)) as follows:

let S = Θ × 1, 2, 3; let Si = Θ × i for i ∈ 1, 2, 3; and let E contain (1) all edges

of the form ((θ, 3), (θ, 1)) for v ∈ Θ, (2) all edges of the form ((θ, 3), (θ, 2)) for v ∈ Θ,

and (3) all edges of the form ((θ, i), (θ′, 3)) for i ∈ 1, 2 and θ′ Xi ∈ δi(θ Xi) and

θ′ (X \ Xi) = θ (X \ Xi). In other words, player 1 represents process P1, player 2

represents process P2, and player 3 represents the scheduler. Given a play of the form

ω = ((θ0, 3), (θ0, i0), (θ1, 3), (θ1, i1), (θ2, 3), . . .), where ij ∈ 1, 2 for all j ≥ 0, we write [ω]1,2

for the sequence of valuations (θ0, θ1, θ2, . . .) in ω (ignoring the intermediate valuations at

player-3 states). A specification Φ ⊆ Θω defines the objective [[Φ]] = ω ∈ Ω | [ω]1,2 ∈ Φ.

In this way, the specifications Φ1 and Φ2 for the processes P1 and P2 provide the objectives

Ψ1 = [[Φ1]] and Ψ2 = [[Φ2]] for players 1 and 2, respectively. The objective for player 3

(the scheduler) is the fairness objective Ψ3 = Fair that both S1 and S2 are visited infinitely

often; i.e., Fair contains all plays (s0, s1, s2, . . .) ∈ Ω such that sj ∈ S1 for infinitely many

j ≥ 0, and sk ∈ S2 for infinitely many k ≥ 0.

Proposition 15 Given two processes P1 = (X1, δ1) and P2 = (X2, δ2), two specifications

Φ1 for P1 and Φ2 for P2, and a start valuation θ0 ∈ Θ, the answer to the weak co-synthesis

problem is “Yes” iff (θ0, 3) ∈ 〈〈1, 2〉〉 bG(Fair → ([[Φ1]] ∧ [[Φ2]])); and the answer to the

classical co-synthesis problem is “Yes” iff both (θ0, 3) ∈ 〈〈1〉〉 bG(Fair → [[Φ1]]) and (θ0, 3) ∈

〈〈2〉〉 bG(Fair → [[Φ2]]).


Proof. We first note that for games with ω-regular objectives, finite-memory winning

strategies suffices (Theorem 48). The proof follows by the following case analysis.

1. Given a finite-memory strategy σ1, a witness P ′1 = (X ′

1, δ′1) for the weak co-synthesis

problem can be obtained as follows: the variables X ′i \ Xi encodes the finite-memory

information of the strategy σ1 and the next-state function of the strategy is then

captured by a deterministic update function δ′1. A similar construction holds for

player 2.

2. Given a witness P ′1 = (X ′

1, δ′1) as a witness for the weak co-synthesis problem, we first

observe that any deterministic restriction of P ′1 (i.e., the transition function δ′1 is made

deterministic) is also a witness to the weak co-synthesis problem. A witness strategy

σ1 in G is obtained as follows: the variables in X ′1\X1 is encoded as the finite-memory

information of σ1 and the deterministic update is captured by the next-state function.

The construction of witness strategies for player 2 is similar.

The proof for classical co-synthesis problem is similar.

Example 10 (Failure of classical co-synthesis) We now demonstrate the failure of

classical co-synthesis for Example 9. We show that for every strategy for process P1, there

exist spoiling strategies for process P2 and the scheduler such that (1) the scheduler is fair

and (2) the specification Φ1 of process P1 is violated. With any fair scheduler, process P1

will eventually set flag[1]:=true. Whenever process P1 enters its critical section (set-

ting Cr1:=true), the scheduler assigns a finite sequence of turns to process P2. During

this sequence, process P2 enters its critical section: it may first choose the alternative C10

to return to the beginning of the the main loop, then set flag[2]:=true; turn:=1; then

pass the guard C4: (since (turn 6= 2)), and enter the critical section (setting Cr2:=true).

This violates the mutual-exclusion requirement Φmutex1 of process P1. On the other hand, if


process P1 never enters its critical section, this violates the starvation-freedom requirement

Φprog

1 of process P1. Thus the answer to the classical co-synthesis problem is “No.”

Game solution to assume-guarantee synthesis. We extend the notion of secure equi-

libria from 2-player games to 3-player games where player 3 can win unconditionally; i.e.,

〈〈3〉〉G(Ψ3) = S for the objective Ψ3 for player 3. In the setting of two processes and a

scheduler (player 3) with a fairness objective, the restriction that 〈〈3〉〉G(Ψ3) = S means

that the scheduler has a fair strategy from all states; this is clearly the case for Ψ3 = Fair.

(Alternatively, the scheduler may not required to be fair; then Ψ3 is the set of all plays, and

the restriction is satisfied trivially.) We characterize the winning secure equilibrium states

and then establish the existence of finite-memory winning secure strategies (Theorem 50).

This will allow us to solve the assume-guarantee synthesis problem by computing winning

secure equilibria (Theorem 51).

Payoffs. In the following, we fix a 3-player game graph G and objectives Ψ1, Ψ2, and

Ψ3 for the three players such that 〈〈3〉〉G(Ψ3) = S. Since 〈〈3〉〉G(Ψ3) = S, any equilibrium

payoff profile will assign payoff 1 to player 3. Hence we focus on payoff profiles whose third

component is 1.

Payoff-profile ordering. The player-1 preference order ≺1 on payoff profiles is lexicographic:

(v1, v2, 1) ≺1 (v′1, v′2, 1) iff either (1) v1 < v′1, or (2) v1 = v′1 and v2 > v′2; that is, player 1

prefers a payoff profile that gives her greater payoff, and if two payoff profiles match in the

first component, then she prefers the payoff profile in which player 2’s payoff is smaller, i.e.,

it is the same preference order defined for secure equilibria for two players. The preference

order for player 2 is symmetric. The preference order for player 3 is such that (v1, v2, 1) ≺3

(v′1, v′2, 1) iff v1 + v2 > v′1 + v′2. Given two payoff profiles (v1, v2, v3) and (v′1, v

′2, v

′3), we write

(v1, v2, v3) = (v′1, v′2, v

′3) iff vi = v′i for all i ∈ 1, 2, 3, and we write (v1, v2, v3) i (v′1, v

′2, v

′3)

iff (v1, v2, v3) ≺i (v′1, v′2, v

′3) or (v1, v2, v3) = (v′1, v

′2, v

′3).


Secure equilibria. A strategy profile (σ1, σ2, σ3) is a secure equilibrium at a state s ∈ S iff its

a Nash equilibrium with respect to the preference order 1,2 and 3. For u,w ∈ 0, 1,

we write Suw1 ⊆ S for the set of states s such that a secure equilibrium with the payoff

profile (u,w, 1) exists at s; that is, s ∈ Suw1 iff there is a secure equilibrium (σ1, σ2, σ3) at

s with payoff profile (u,w, 1). Moreover, we write MSuw1(G) ⊆ Suw1 for the set of states s

such that the payoff profile (u,w, 1) is a maximal secure equilibrium payoff profile at s;

that is, s ∈ MSuw1(G) iff (1) s ∈ Suw1, and (2) for all u′, w′ ∈ 0, 1, if s ∈ Su′w′1, then

both (u′, w′, 1) 1 (u,w, 1) and (u′, w′, 1) 2 (u,w, 1). The states in MS111(G) are referred

to as winning secure equilibrium states, and the witnessing secure equilibrium strategies as

winning secure strategies.

Theorem 49 Let G be a 3-player game graph G with the objectives Ψ1, Ψ2, and Ψ3 for

the three players such that 〈〈3〉〉G(Ψ3) = S. Let

U1 = 〈〈1〉〉G(Ψ3 → Ψ1); and U2 = 〈〈2〉〉G(Ψ3 → Ψ2);Z1 = 〈〈1, 3〉〉GU1(Ψ1 ∧ Ψ3 ∧ ¬Ψ2); and Z2 = 〈〈2, 3〉〉GU2(Ψ2 ∧ Ψ3 ∧ ¬Ψ1);W = 〈〈1, 2〉〉G(S\(Z1∪Z2))(Ψ3 → (Ψ1 ∧ Ψ2)).

Then the following assertions hold: (1) at all states in Z1 the only secure equilibrium payoff

profile is (1, 0, 1); (2) at all states in Z2 the only secure equilibrium payoff profile is (0, 1, 1);

and (3) W = MS111(G).

Proof. We prove parts (1) and (3); the proof of part (2) is similar to part (1).

Part (1). Since 〈〈3〉〉G(Ψ3) = S and Z1 ⊆ U1 = 〈〈1〉〉G(Ψ3 → Ψ1), it follows that any secure

equilibrium profile in Z1 has payoff profile of the form (1, , 1). Since (1, 1, 1) ≺1 (1, 0, 1) and

(1, 1, 1) ≺3 (1, 0, 1), to prove uniqueness it suffices to show that player 1 and player 3 can fix

strategies to ensure secure equilibrium payoff profile (1, 0, 1). Since Z1 = 〈〈1, 3〉〉GU1 (Ψ1 ∧

Ψ3 ∧ ¬Ψ2), consider the strategy pair (σ1, σ3) such that against all player 2 strategies σ2

and for all states s ∈ Z1, we have ωσ1,σ2,σ3(s) ∈ (Ψ1 ∧ Ψ3 ∧ ¬Ψ2). The secure equilibrium


strategy pair (σ∗1 , σ

∗3) for player 1 and player 3 (along with any strategy σ2 for player 2) is

constructed as follows.

1. The strategy σ∗1 is as follows: player 1 plays σ1 and if player 3 deviates from σ3, then

player 1 switches to a winning strategy for Ψ3 → Ψ1. Such a strategy exists since

Z1 ⊆ U1 = 〈〈1〉〉G(Ψ3 → Ψ1).

2. The strategy σ∗3 is as follows: player 3 plays σ3 and if player 1 deviates from σ1,

then player 3 switches to a winning strategy for Ψ3. Such a strategy exists since

〈〈3〉〉G(Ψ3) = S.

Hence objective of player 1 is always satisfied, given objective of player 3 is satisfied. Thus

player 3 has no incentive to deviate. Similarly, player 1 also has no incentive to deviate.

The result follows.

Part (3). By Theorem 48 we have S \ W = 〈〈3〉〉G(Ψ3 ∧ (¬Ψ1 ∨ ¬Ψ2)) and there is a

player 3 strategy σ3 that satisfies Ψ3 ∧ (¬Ψ1 ∨ ¬Ψ2) against all strategies of player 1 and

player 2. Hence the equilibrium (1, 1, 1) cannot exist in the complement set of W , i.e.,

MS 111(G) ⊆ W . We now show that in W there is a secure equilibrium with payoff profile

(1, 1, 1). The following construction completes the proof.

1. In W ∩ U1, player 1 plays a winning strategy for objective Ψ3 → Ψ1, and player 2

plays a winning strategy for objective (Ψ3 ∧ Ψ1) → Ψ2. Observe that S \ Z1 =

〈〈2〉〉G(¬Ψ1 ∨ ¬Ψ3 ∨ Ψ2), and hence such a winning strategy exists for player 2.

2. In W ∩ (U2 \ U1), player 2 plays a winning strategy for objective Ψ3 → Ψ2, and

player 1 plays a winning strategy for objective (Ψ2∧Ψ3) → Ψ1. Observe that S \Z2 =

〈〈1〉〉G(¬Ψ2 ∨ ¬Ψ3 ∨ Ψ1), and hence such a winning strategy exists for player 1.

3. By Theorem 48 we have W \U1 = 〈〈2, 3〉〉G(¬Ψ1∧Ψ3) and W \U2 = 〈〈1, 3〉〉G(¬Ψ2∧Ψ3).

The strategy construction in W \ (U1 ∪U2) is as follows: player 1 and player 2 play a


strategy (σ1, σ2) to satisfy Ψ1∧Ψ2 against all strategies of player 3, and player 3 plays

a winning strategy for Ψ3; if player 1 deviates, then player 2 and player 3 switches to

a strategy (σ2, σ3) such that against all strategies for player 1 the objective Ψ3 ∧¬Ψ1

is satisfied; and if player 2 deviates, then player 1 and player 3 switches to a strategy

(σ1, σ3) such that against all strategies for player 2 the objective Ψ3∧¬Ψ2 is satisfied.

Hence neither player 1 and nor player 2 has any incentive to deviate according to the

preference order 1 and 2, respectively.

Alternative characterization of winning secure equilibria. In order to obtain a characteri-

zation of the set MS111(G) in terms of strategies, we extend the definition of retaliation

strategies for the case of three players. Given objectives Ψ1, Ψ2, and Ψ3 for the three

players, and a state s ∈ S, the sets of retaliation strategies for players 1 and 2 at s are

Re1(s) = σ1 ∈ Σ1 | ∀σ2 ∈ Σ2.∀σ3 ∈ Σ3. ωσ1,σ2,σ3(s) ∈ ((Ψ3 ∧ Ψ2) → Ψ1);Re2(s) = σ2 ∈ Σ2 | ∀σ1 ∈ Σ1.∀σ3 ∈ Σ3. ωσ1,σ2,σ3(s) ∈ ((Ψ3 ∧ Ψ1) → Ψ2).

Theorem 50 Let G be a 3-player game graph G with the objectives Ψ1, Ψ2, and Ψ3 for the

three players such that 〈〈3〉〉G(Ψ3) = S. Let U = s ∈ S | ∃σ1 ∈ Re1(s). ∃σ2 ∈ Re2(s). ∀σ3 ∈

Σ3. ωσ1,σ2,σ3(s) ∈ (Ψ3 → (Ψ1 ∧ Ψ2)). Then U = MS111(G).

Proof. We first show that U ⊆ MS 111(G). For a state s ∈ U , choose σ1 ∈ Re1(s) and

σ2 ∈ Re2(s) such that for all σ3 ∈ Σ3, we have ωσ1,σ2,σ3(s) ∈ (Ψ3 → (Ψ1 ∧ Ψ2)). Fixing the

strategies σ1 and σ2 for players 1 and 2, and a winning strategy for player 3, we obtain the

secure equilibrium payoff profile (1, 1, 1). We now show that MS 111(G) ⊆ U . This follows

from the proof of Theorem 49. In Theorem 49 we proved that for all states s ∈ (S\(Z1∪Z2)),

we have Re1(s) 6= ∅ and Re2(s) 6= ∅; and the winning secure strategies constructed for the

set W = MS111(G) are witness strategies to prove that MS 111(G) ⊆ U .

Observe that for ω-regular objectives, the winning secure strategies of Theorem 50 are finite-

memory strategies. The existence of finite-memory winning secure strategies and argument

similar to Proposition 15 establishes the following theorem.


Theorem 51 (Game solution of assume-guarantee synthesis) Given two processes

P1 = (X1, δ1) and P2 = (X2, δ2), two specifications Φ1 for P1 and Φ2 for P2, and a

start valuation θ0 ∈ Θ, the answer to the assume-guarantee synthesis problem is “Yes” iff

(θ0, 3) ∈ MS111(G) for the 3-player game graph G with the objectives Ψ1 = [[Φ1]], Ψ2 = [[Φ2]],

and Ψ3 = Fair.

Example 11 (Assume-guarantee synthesis of mutual-exclusion protocol) We

consider the 8 alternatives C1–C8 of process P1, and the corresponding spoiling strategies

for process P2 and the scheduler to violate P1’s specification. We denote by [→] a switch

between the two processes (decided by the scheduler).

C1 The spoiling strategies for process P2 and the scheduler cause the following sequence of

updates:

P1: flag[1]:=true; turn:=2; [→];P2: flag[2]:=true; turn:=1;

P2: enters the critical section by passing the guard C8: (since (turn 6= 2)).After exiting its critical section, process P2 chooses the alternative C10

to enter the beginning of the main loop, sets flag[2]:=true; turn:=1;

and then the scheduler assigns the turn to process P1, which cannot enterits critical section. The scheduler then assigns turn to P2 and thenP2 enters the critical section by passing guard C8 and this sequence isrepeated forever.

The same spoiling strategies work for choices C2, C3, C6 and C7.

C4 The spoiling strategies cause the following sequence of updates:

P2: flag[2]:=true; turn:=1; [→];P1: flag[1]:=true; turn:=2; [→];P2: enters the critical section by passing the guard C3: (since (turn 6= 1)).

After exiting its critical section, process P2 continues to choose the al-ternative C9 forever, and the scheduler alternates turn between P1 andP2; and process P1 cannot enter its critical section.

The same spoiling strategies work for the choice C5.


C8 The spoiling strategies cause the following sequence of updates:

P2: flag[2]:=true; turn:=1; [→];P1: flag[1]:=true; turn:=2; [→];P2: while(flag[2]) nop;

Then process P2 does not enter its critical section, and neither can process P1 enter.

In this case P2 cannot violate P1’s specification without violating her own specification.

It follows from this case analysis that no alternatives except C8 for process P1 can witness

a solution to the assume-guarantee synthesis problem. The alternative C8 for process P1

and the symmetric alternative C6 for process P2 provide winning secure strategies for both

processes. In this example, we considered refinements without additional variables; but in

general refinements can have additional variables.

9.5 Conclusion

We considered non-zero-sum graph games with lexicographically ordered objectives

for the players in order to capture adversarial external choice, where each player tries

to minimize the other player’s payoff as long as this does not decrease her own payoff.

We showed that these games have an unique maximal equilibrium for all Borel winning

conditions. This confirms that secure equilibria provide a good formalization of rational

behavior in the context of verifying component-based systems. We also show the relevance

of the equilibria in the co-synthesis problem. The extension of the notion of secure equilibria

in stochastic games and other quantitative setting is an interesting open problem.

236

Bibliography

[AH99] R. Alur and T.A. Henzinger. Reactive modules. In Formal Methods in System

Design, pages 207–218. IEEE Computer Society Press, 1999.

[AHK02] R. Alur, T.A. Henzinger, and O. Kupferman. Alternating-time temporal logic.

Journal of the ACM, 49:672–713, 2002.

[AHKV98] R. Alur, T.A. Henzinger, O. Kupferman, and M.Y. Vardi. Alternating refine-

ment relations. In CONCUR’97, LNCS 1466, Springer, pages 163–178, 1998.

[AL95] M. Abadi and L. Lamport. Conjoining specifications. ACM Transactions on

Programming Languages and Systems, 17(3):507–534, 1995.

[ALW89] M. Abadi, L. Lamport, and P. Wolper. Realizable and unrealizable specifi-

cations of reactive systems. In ICALP’89, LNCS 372, Springer, pages 1–17,

1989.

[Bas99] S. Basu. New results on quantifier elimination over real closed fields and ap-

plications to constraint databases. Journal of the ACM, 46(4):537–555, 1999.

[Ber95] D.P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scien-

tific, 1995. Volumes I and II.

[BGNV05] A. Blass, Y. Gurevich, L. Nachmanson, and M. Veanes. Play to test. In

FATES’05, 2005.

BIBLIOGRAPHY 237

[Bil95] P. Billingsley, editor. Probability and Measure. Wiley-Interscience, 1995.

[BK76] T. Bewley and E. Kohlberg. The asymptotic theory of stochastic games. Math-

ematics of Operations Research, 1, 1976.

[BL69] J.R. Buchi and L.H. Landweber. Solving sequential conditions by finite-state

strategies. Transactions of the AMS, 138:295–311, 1969.

[BPMF] S. Basu, R. Pollack, and M.-F.Roy. Algorithms in Real Algebraic Geometry.

Springer-Verlag.

[BSV03] H. Bjorklund, S. Sandberg, and S. Vorobyov. A discrete subexponential algo-

rithms for parity games. In STACS’03, pages 663–674. LNCS 2607, Springer,

2003.

[Buc62] J.R. Buchi. On a decision method in restricted second-order arithmetic. In

E. Nagel, P. Suppes, and A. Tarski, editors, Proceedings of the First Interna-

tional Congress on Logic, Methodology, and Philosophy of Science 1960, pages

1–11. Stanford University Press, 1962.

[Can88] J. Canny. Some algebraic and geometric computations in PSPACE. In

STOC’88, pages 460–467. ACM Press, 1988.

[CD06] X. Chen and X. Deng. Settling the complexity of 2-player Nash-equilibrium.

In FOCS’06. IEEE, 2006. ECCC TR05-140.

[CdAH04] K. Chatterjee, L. de Alfaro, and T.A. Henzinger. Trading memory for ran-

domness. In QEST’04, pages 206–217. IEEE, 2004.

[CdAH05] K. Chatterjee, L. de Alfaro, and T.A. Henzinger. The complexity of stochastic

Rabin and Streett games. In ICALP’05, pages 878–890. LNCS 3580, Springer,

2005.

BIBLIOGRAPHY 238

[CdAH06a] K. Chatterjee, L. de Alfaro, and T.A. Henzinger. The complexity of quan-

titative concurrent parity games. In SODA’06, pages 678–687. ACM-SIAM,

2006.

[CdAH06b] K. Chatterjee, L. de Alfaro, and T.A. Henzinger. Strategy improvement in

concurrent reachability games. In QEST’06. IEEE, 2006.

[CDHR06] K. Chatterjee, L. Doyen, T.A. Henzinger, and J.F. Raskin. Algorithms for

omega-regular games with imperfect information. In CSL’06, pages 287–302.

LNCS 4207, Springer, 2006.

[CE81] E.M. Clarke and E.A. Emerson. Design and synthesis of synchronization skele-

ton using branching-time temporal logic. In Logic of Programs’81, pages 52–71,

1981.

[CH05] K. Chatterjee and T.A. Henzinger. Semiperfect-information games. In

FSTTCS’05. LNCS 3821, Springer, 2005.

[CH06a] K. Chatterjee and T.A. Henzinger. Strategy improvement and randomized

subexponential algorithms for stochastic parity games. In STACS’06, LNCS

3884, Springer, pages 512–523, 2006.

[CH06b] K. Chatterjee and T.A. Henzinger. Strategy improvement for stochastic Rabin

and Streett games. In CONCUR’06, LNCS 4137, Springer, pages 375–389,

2006.

[CH07] K. Chatterjee and T.A. Henzinger. Assume guarantee synthesis. In TACAS’07,

LNCS 4424, Springer, pages 261–275, 2007.

[Cha05] K. Chatterjee. Two-player nonzero-sum ω-regular games. In CONCUR’05,

pages 413–427. LNCS 3653, Springer, 2005.

BIBLIOGRAPHY 239

[Cha06] K. Chatterjee. Concurrent games with tail objectives. In CSL’06, LNCS 4207,

Springer, pages 256–270, 2006.

[Cha07a] K. Chatterjee. Concurrent games with tail objectives. Theoretical Computer

Science, 2007. (To Appear).

[Cha07b] K. Chatterjee. Optimal strategy synthesis for stochastic Muller games. In

FoSSaCS’07, LNCS 4423, Springer, pages 138–152, 2007.

[Cha07c] K. Chatterjee. Stochastic Muller games are PSPACE-complete. In FSTTCS’07,

2007. (To Appear).

[CHJ04] K. Chatterjee, T.A. Henzinger, and M. Jurdzinski. Games with secure equi-

libria. In LICS’04, pages 160–169. IEEE, 2004.

[CHJM05] K. Chatterjee, T.A. Henzinger, R. Jhala, and R. Majumdar. Counterexample-

guided planning. In UAI, pages 104–111. AUAI Press, 2005.

[CHP07] K. Chatterjee, T.A. Henzinger, and N. Piterman. Generalized parity games.

In FoSSaCS’07, LNCS 4423, Springer, 2007.

[Chu62] A. Church. Logic, arithmetic, and automata. In Proceedings of the Interna-

tional Congress of Mathematicians, pages 23–35. Institut Mittag-Leffler, 1962.

[CJH03] K. Chatterjee, M. Jurdzinski, and T.A. Henzinger. Simple stochastic parity

games. In CSL’03, LNCS 2803, Springer, pages 100–113, 2003.

[CJH04] K. Chatterjee, M. Jurdzinski, and T.A. Henzinger. Quantitative stochastic

parity games. In SODA’04, ACM-SIAM, pages 114–123, 2004.

[CMH07] K. Chatterjee, R. Majumdar, and T.A. Henzinger. Stochastic limit-average

games are in EXPTIME. International Journal of Game Theory, 2007. (To

Appear).

BIBLIOGRAPHY 240

[Con92] A. Condon. The complexity of stochastic games. Information and Computa-

tion, 96(2):203–224, 1992.

[Con93] A. Condon. On algorithms for simple stochastic games. In Advances in Com-

putational Complexity Theory, volume 13 of DIMACS Series in Discrete Math-

ematics and Theoretical Computer Science, pages 51–73. American Mathemat-

ical Society, 1993.

[CY90] C. Courcoubetis and M. Yannakakis. Markov decision processes and regular

events. In ICALP’90, LNCS 443, Springer, pages 336–349, 1990.

[CY95] C. Courcoubetis and M. Yannakakis. The complexity of probabilistic verifica-

tion. Journal of the ACM, 42(4):857–907, 1995.

[dA97] L. de Alfaro. Formal Verification of Probabilistic Systems. PhD thesis, Stanford

University, 1997.

[dAFH+03] L. de Alfaro, M. Faella, T.A. Henzinger, R. Majumdar, and M. Stoelinga. The

element of surprise in timed games. In CONCUR’03, LNCS 2761, Springer,

pages 144–158. 2003.

[dAH00] L. de Alfaro and T.A. Henzinger. Concurrent omega-regular games. In

LICS’00, pages 141–154. IEEE, 2000.

[dAH01] L. de Alfaro and T.A. Henzinger. Interface theories for component-based de-

sign. In EMSOFT’01, LNCS 2211, Springer, pages 148–165. 2001.

[dAHK98] L. de Alfaro, T.A. Henzinger, and O. Kupferman. Concurrent reachability

games. In FOCS,98, pages 564–575. IEEE, 1998.

[dAHM00a] L. de Alfaro, T.A. Henzinger, and F.Y.C. Mang. The control of synchronous

systems. In CONCUR’00, LNCS 1877, pages 458–473. Springer, 2000.

BIBLIOGRAPHY 241

[dAHM00b] L. de Alfaro, T.A. Henzinger, and F.Y.C. Mang. Detecting errors before reach-

ing them. In CAV’00, LNCS 1855, Springer, pages 186–201, 2000.

[dAHM01] L. de Alfaro, T.A. Henzinger, and F.Y.C. Mang. The control of synchronous

systems, part ii. In CONCUR’01, LNCS 2154, pages 566–580. Springer, 2001.

[dAHM03] L. de Alfaro, T.A. Henzinger, and R. Majumdar. Discounting the future in

systems theory. In ICALP’03, LNCS 2719, pages 1022–1037. Springer, 2003.

[dAM01] L. de Alfaro and R. Majumdar. Quantitative solution of omega-regular games.

In STOC 01, pages 675–683. ACM, 2001.

[Der70] C. Derman. Finite State Markovian Decision Processes. Academic Press, 1970.

[DGP06] C. Daskalakis, P.W. Goldberg, and C.H. Papadimitriou. The complexity of

computing a Nash equilibrium. In STOC’06. ACM, 2006. ECCC, TR05-115.

[Dil89] D.L. Dill. Trace Theory for Automatic Hierarchical Verification of Speed-

independent Circuits. The MIT Press, 1989.

[DJW97] S. Dziembowski, M. Jurdzinski, and I. Walukiewicz. How much memory is

needed to win infinite games? In LICS’97, pages 99–110. IEEE, 1997.

[Dur95] Richard Durrett. Probability: Theory and Examples. Duxbury Press, 1995.

[EJ88] E.A. Emerson and C. Jutla. The complexity of tree automata and logics of

programs. In FOCS’88, pages 328–337. IEEE, 1988.

[EJ91] E.A. Emerson and C. Jutla. Tree automata, mu-calculus and determinacy. In

FOCS’91, pages 368–377. IEEE, 1991.

[EM79] A. Ehrenfeucht and J. Mycielski. Positional strategies for mean payoff games.

Int. Journal of Game Theory, 8(2):109–113, 1979.

BIBLIOGRAPHY 242

[Eve57] H. Everett. Recursive games. In Contributions to the Theory of Games III,

volume 39 of Annals of Mathematical Studies, pages 47–78, 1957.

[EY05] K. Etessami and M. Yannakakis. Recursive Markov decision processes and

recursive stochastic games. In ICALP’05, LNCS 3580, Springer, pages 891–

903, 2005.

[EY06] K. Etessami and M. Yannakakis. Recursive concurrent stochastic games. In

ICALP’06 (2), LNCS 4052, Springer, pages 324–335, 2006.

[EY07] K. Etessami and M. Yannakakis. On the complexity of Nash equilibria and

other fixed points. In FOCS’07, IEEE, 2007.

[Fin64] A.M. Fink. Equilibrium in a stochastic n-person game. Journal of Science of

Hiroshima University, 28:89–93, 1964.

[FV97] J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer-

Verlag, 1997.

[GH82] Y. Gurevich and L. Harrington. Trees, automata, and games. In STOC’82,

pages 60–65. ACM, 1982.

[HD05] P. Hunter and A. Dawar. Complexity bounds for regular games. In MFCS’05,

pages 495–506, 2005.

[HJM03] T.A. Henzinger, R. Jhala, and R. Majumdar. Counterexample-guided control.

In ICALP’03, LNCS 2719, pages 886–902. Springer, 2003.

[HKR02] T.A. Henzinger, O. Kupferman, and S. Rajamani. Fair simulation. Information

and Computation, 173:64–81, 2002.

BIBLIOGRAPHY 243

[HMMR00] T.A. Henzinger, R. Majumdar, F.Y.C. Mang, and J.-F. Raskin. Abstract

interpretation of game properties. In SAS’00, LNCS 1824, pages 220–239.

Springer, 2000.

[Hor05] F. Horn. Streett games on finite graphs. In GDV’05, 2005.

[JPZ06] M. Jurdzinski, M. Paterson, and U. Zwick. A deterministic subexponential

algorithm for solving parity games. In SODA’06, pages 117–123. ACM-SIAM,

2006.

[Jr50] J.F. Nash Jr. Equilibrium points in n-person games. Proceedings of the Na-

tional Academny of Sciences USA, 36:48–49, 1950.

[Jur00] Marcin Jurdzinski. Small progress measures for solving parity games. In

STACS’00, LNCS 1770, Springer, pages 290–301, 2000.

[Kem83] J.H. Kemeny. Finite Markov Chains. Springer, 1983.

[Koz83] D. Kozen. Results on the propositional µ-calculus. Theoretical Computer

Science, 27(3):333–354, 1983.

[Kre90] D.M. Kreps. A Course in Microeconomic Theory. Princeton Univeristy Press,

1990.

[KV98] O. Kupferman and M.Y. Vardi. Weak alternating automata and tree automata

emptiness. In STOC’98, pages 224–233. ACM, 1998.

[LL69] T. A. Liggett and S. A. Lippman. Stochastic games with perfect information

and time average payoff. Siam Review, 11:604–607, 1969.

[Maj03] R. Majumdar. Symbolic algorithms for verification and control. PhD thesis,

UC Berkeley, 2003.

BIBLIOGRAPHY 244

[Mar75] D.A. Martin. Borel determinacy. Annals of Mathematics, 102(2):363–371,

1975.

[Mar98] D.A. Martin. The determinacy of Blackwell games. The Journal of Symbolic

Logic, 63(4):1565–1581, 1998.

[McN93] R. McNaughton. Infinite games played on finite graphs. Annals of Pure and

Applied Logic, 65:149–184, 1993.

[MM02] A. McIver and C. Morgan. Games, probability, and the quantitative µ-calculus

qµ. In LPAR’02, LNCS 2514, Springer, pages 292–310, 2002.

[MN81] J.F. Mertens and A. Neyman. Stochastic games. International Journal of

Game Theory, 10:53–66, 1981.

[Mos84] A.W. Mostowski. Regular expressions for infinite trees and a standard form

of automata. In 5th Symposium on Computation Theory, LNCS 208, pages

157–168. Springer, 1984.

[MP92] Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent

Systems: Specification. Springer-Verlag, 1992.

[MPS95] O. Maler, A. Pnueli, and J. Sifakis. On the synthesis of discrete controllers

for timed systems. In STACS’95, LNCS 900, pages 229–242. Springer-Verlag,

1995.

[NAT03] K. Namjoshi N. Amla, E.A. Emerson and R. Trefler. Abstract patterns for

compositional reasoning. In CONCUR 03: Concurrency Theory, 2003.

[Niw97] D. Niwinski. Fixed-point characterization of infinite behavior of finite-state

systems. In Theoretical Computer Science, volume 189(1-2), pages 1–69, 1997.

BIBLIOGRAPHY 245

[NS03] A. Neyman and S. Sorin. Stochastic Games and Applications. Kluwer Academic

Publishers, 2003.

[Owe95] G. Owen. Game Theory. Academic Press, 1995.

[Pap01] C.H. Papadimitriou. Algorithms, games, and the internet. In STOC 01, pages

749–753. ACM Press, 2001.

[PP06] N. Piterman and A. Pnueli. Faster solution of rabin and streett games. In

LICS’06, pages 275–284. IEEE, 2006.

[PR89] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In POPL’89,

pages 179–190. ACM, 1989.

[PT87] C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of Markov decision

processes. Mathematics of Operations Research, 12:441–450, 1987.

[Rab69] M.O. Rabin. Automata on Infinite Objects and Church’s Problem. Number 13

in Conference Series in Mathematics. American Mathematical Society, 1969.

[Rei79] J. H. Reif. Universal games of incomplete information. In STOC’79, pages

288–308. ACM, 1979.

[RF91] T.E.S. Raghavan and J.A. Filar. Algorithms for stochastic games — a survey.

ZOR — Methods and Models of Operations Research, 35:437–472, 1991.

[RW87] P.J. Ramadge and W.M. Wonham. Supervisory control of a class of discrete-

event processes. SIAM Journal of Control and Optimization, 25(1):206–230,

1987.

[Saf88] S. Safra. On the complexity of ω-automata. In Proceedings of the 29th An-

nual Symposium on Foundations of Computer Science, pages 319–327. IEEE

Computer Society Press, 1988.

BIBLIOGRAPHY 246

[Sha53] L.S. Shapley. Stochastic games. Proc. Nat. Acad. Sci. USA, 39:1095–1100,

1953.

[SS01] P. Secchi and W.D. Sudderth. Stay-in-a-set games. International Journal of

Game Theory, 30:479–490, 2001.

[Tar51] A. Tarski. A Decision Method for Elementary Algebra and Geometry. Univer-

sity of California Press, Berkeley and Los Angeles, 1951.

[Tho95] W. Thomas. On the synthesis of strategies in infinite games. In STACS 95,

LNCS 900, Springer, pages 1–13, 1995.

[Tho97] W. Thomas. Languages, automata, and logic. In Handbook of Formal Lan-

guages, volume 3, Beyond Words, chapter 7, pages 389–455. Springer, 1997.

[Var85] M.Y. Vardi. Automatic verification of probabilistic concurrent finite-state sys-

tems. In FOCS’85, pages 327–338. IEEE, 1985.

[Vie00a] N. Vieille. Two player stochastic games I: a reduction. Israel Journal of

Mathematics, 119:55–91, 2000.

[Vie00b] N. Vieille. Two player stochastic games II: the case of recursive games. Israel

Journal of Mathematics, 119:93–126, 2000.

[VJ00] J. Voge and M. Jurdzinski. A discrete strategy improvement algorithm for

solving parity games. In CAV’00, LNCS 1855, Springer, pages 202–215, 2000.

[vNM47] J. von Neumann and O. Morgenstern. Theory of games and economic behavior.

Princeton University Press, 1947.

[VW86] M.Y. Vardi and P. Wolper. An automata-theoretic approach to automatic

program verification. In LICS’86, pages 322–331. IEEE, 1986.

BIBLIOGRAPHY 247

[Wad84] W.W. Wadge. Reducibility and Determinateness of Baire Spaces. PhD thesis,

UC Berkeley, 1984.

[Wal96] I. Walukiewicz. Pushdown processes: Games and model checking. In CAV’96,

LNCS 1102, pages 62–74. Springer, 1996.

[Wal04] I. Walukiewicz. A landscape with games in the background. In LICS’04, pages

356–366. IEEE, 2004.

[Zie98] W. Zielonka. Infinite games on finitely coloured graphs with applications to

automata on infinite trees. In Theoretical Computer Science, volume 200(1-2),

pages 135–183, 1998.

[ZP96] U. Zwick and M.S. Paterson. The complexity of mean payoff games on graphs.

Theoretical Computer Science, 158:343–359, 1996.

University of California, Berkeley1 Abstract Stochastic ω-Regular Games by Krishnendu Chatterjee...

Documents

Transcript of University of California, Berkeley1 Abstract Stochastic ω-Regular Games by Krishnendu Chatterjee...