Milner Lecture From bisimulation to representation ...

Milner LectureFrom bisimulation to representation learning via

metrics

Prakash PanangadenSchool of Computer Science

McGill Universityand

Montreal Institute of Learning Algorithms

30 September 2021

Panangaden Milner Lecture 2021 30 September 2021 1 / 40

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Outline

1 Introduction

5 Metrics

7 The MICo Distance

9 Conclusions

Outline

1 Introduction

5 Metrics

7 The MICo Distance

9 Conclusions

Outline

1 Introduction

5 Metrics

7 The MICo Distance

9 Conclusions

Outline

1 Introduction

5 Metrics

7 The MICo Distance

9 Conclusions

Outline

1 Introduction

5 Metrics

7 The MICo Distance

9 Conclusions

Outline

1 Introduction

5 Metrics

7 The MICo Distance

9 Conclusions

Outline

1 Introduction

5 Metrics

7 The MICo Distance

9 Conclusions

Outline

1 Introduction

5 Metrics

7 The MICo Distance

9 Conclusions

Behavioural equivalence is fundamental

When do two states have exactly the same behaviour?

What can one observe of the behaviour?What should be guaranteed?(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.(ii) If two states are equivalent they should stay equivalent as theyevolve.

When do two states have exactly the same behaviour?What can one observe of the behaviour?

What should be guaranteed?(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.(ii) If two states are equivalent they should stay equivalent as theyevolve.

When do two states have exactly the same behaviour?What can one observe of the behaviour?What should be guaranteed?

(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.(ii) If two states are equivalent they should stay equivalent as theyevolve.

When do two states have exactly the same behaviour?What can one observe of the behaviour?What should be guaranteed?(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.

(ii) If two states are equivalent they should stay equivalent as theyevolve.

When do two states have exactly the same behaviour?What can one observe of the behaviour?What should be guaranteed?(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.(ii) If two states are equivalent they should stay equivalent as theyevolve.

Heros of concurrency theory: Milner and Park

Inspiration for my work I: Dexter Kozen

Inspiration for my work II: Lawvere and Giry

Special thanks I

Special thanks II

A bit of history

Cantor and the back-and-forth argument

Lumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’s

Bisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and Park

Probabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989

Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...

Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999

Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001

Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003

Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004

Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

The definition

A set of states S,

a set of labels or actions, L or A anda transition relation ⊆ S×A× S, usually written

→a⊆ S× S.

The transitions could be indeterminate (nondeterministic).We write s a−−→ s′ for (s, s′) ∈→a.

The definition

A set of states S,a set of labels or actions, L or A and

a transition relation ⊆ S×A× S, usually written

→a⊆ S× S.

The definition

A set of states S,a set of labels or actions, L or A anda transition relation ⊆ S×A× S, usually written

→a⊆ S× S.

The transitions could be indeterminate (nondeterministic).

We write s a−−→ s′ for (s, s′) ∈→a.

The definition

A set of states S,a set of labels or actions, L or A anda transition relation ⊆ S×A× S, usually written

→a⊆ S× S.

Vending machine LTSs

Place cup

Insert money

Choose

WaitWait

CoffeeTea

Dispense coffeeDispense tea

Vending machine LTSs

Place cup

Insert money

ChooseChoose

WaitWait

£1£1

CoffeeTea

Dispense teaDispense coffee

Are the two LTSs equivalent?

One gives us the choice whereas the other makes the choiceinternally.

The sequences that the machines can perform are identical:[Cup;£1;(Cof + Tea)]∗

We need to go beyond language equivalence.

One gives us the choice whereas the other makes the choiceinternally.The sequences that the machines can perform are identical:[Cup;£1;(Cof + Tea)]∗

Formal definition

s s′

t t′

[Bisimulation definition]If s ∼ t then

∀s ∈ S, ∀a ∈ A, s a−−→ s′ ⇒ ∃t′, t a−−→ t′ with s′ ∼ t′

and vice versa with s and t interchanged.

Discrete probabilistic transition systems

Just like a labelled transition system with probabilities associatedwith the transitions.

(S,A,∀a ∈ A Ta : S× S −→ [0, 1])

The model is reactive: All probabilistic data is internal - noprobabilities associated with environment behaviour.

(S,A, ∀a ∈ A Ta : S× S −→ [0, 1])

Probabilistic bisimulation : Larsen and Skou

b, 1 c, 1 c, 1

a, 13 a, 2

b, 1 c, 1

Are s0 and t0 bisimilar?

Yes, but one needs to add up the probabilities to s2 and s3.

If s is a state, a an action and C a set of states, we writeTa(s,C) =

∑s′∈S Ta(s, s′) for the probability of jumping on an a-action to

one of the states in C.

DefinitionR is a bisimulation relation if whenever sRt and C is an equivalenceclass of R then Ta(s,C) = Ta(t,C).

Markov decision processes?

Markov decision processes are probabilistic versions of labelledtransition systems. Labelled transition systems where the finalstate is governed by a probability distribution - no otherindeterminacy.

There is a reward associated with each transition.We observe the interactions and the rewards - not the internalstates.

Markov decision processes are probabilistic versions of labelledtransition systems. Labelled transition systems where the finalstate is governed by a probability distribution - no otherindeterminacy.There is a reward associated with each transition.

We observe the interactions and the rewards - not the internalstates.

Markov decision processes are probabilistic versions of labelledtransition systems. Labelled transition systems where the finalstate is governed by a probability distribution - no otherindeterminacy.There is a reward associated with each transition.We observe the interactions and the rewards - not the internalstates.

Markov decision processes: formal definition