(Co)-Algebraic and Analytical Aspects of Weighted Automata ... · Minimisation vs Approximation •...

Post on 03-Nov-2020

2 views 0 download

Transcript of (Co)-Algebraic and Analytical Aspects of Weighted Automata ... · Minimisation vs Approximation •...

(Co)-Algebraic and Analytical Aspects of Weighted Automata Minimisation and Equivalence

(Part 2)

Borja Balle

[CMCS Tutorial, Apr 2 2016]

Minimisation vs Approximation

16 novática Special English Edition - 2013/2014 Annual Selection of Articles

monograph Process Mining

monograph

Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.

Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.

16 novática Special English Edition - 2013/2014 Annual Selection of Articles

monograph Process Mining

monograph

Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.

Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.

minimisation

Minimisation vs Approximation

16 novática Special English Edition - 2013/2014 Annual Selection of Articles

monograph Process Mining

monograph

Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.

Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.

16 novática Special English Edition - 2013/2014 Annual Selection of Articles

monograph Process Mining

monograph

Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.

Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.

minimisation

q1 q2

q3 q4approximation

Minimisation vs Approximation

• Minimisation is naturally related to (co-)algebraic properties

• Approximation questions need analytical tools (eg. metrics)

16 novática Special English Edition - 2013/2014 Annual Selection of Articles

monograph Process Mining

monograph

Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.

Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.

16 novática Special English Edition - 2013/2014 Annual Selection of Articles

monograph Process Mining

monograph

Figure 6. Process Model Discovered for a Group of 627 Gynecological Oncology Patients.

Figure 5. Conformance Analysis Showing Deviations between Eventlog and Process.

minimisation

q1 q2

q3 q4approximation

Motivations for Approximation

1. Efficient (approximate) computation with WA

• Compute faster and pay a controlled price in terms of accuracy

2. Machine learning for WA

• Understand learning bias, design better algorithms

3. Interesting mathematical questions

WA and Rational Functions

A q1

11

q2

´1´2

a, 1b, 0

a,´1b,´2

a, 3b, 5

a,´2b, 0 A = ��, �, {Ta}a���

�, � � Rn, Ta � Rn�n

� =

1-1

�Tb =

�0 �20 5

�Ta =

�1 �1

�2 3

�� =

�1 �2

WA and Rational Functions

A q1

11

q2

´1´2

a, 1b, 0

a,´1b,´2

a, 3b, 5

a,´2b, 0 A = ��, �, {Ta}a���

�, � � Rn, Ta � Rn�n

fA : ⌃? ! Rsum of weights of all paths labelled by the

input string

fA(x) = �Tx1 · · · Txt�

= �Tx�

(Pseudo-)Metrics for ApproximationHow do we measure the quality of WA approximations?

(Pseudo-)Metrics for Approximation

�f�p =

��

x���

|f(x)|p�1/p

distp(A,B) = ∥fA − fB∥p

How do we measure the quality of WA approximations?

(Pseudo-)Metrics for Approximation

�f�p =

��

x���

|f(x)|p�1/p

distp(A,B) = ∥fA − fB∥p

How do we measure the quality of WA approximations?

pseudo-metric

(Pseudo-)Metrics for Approximation

�f�p =

��

x���

|f(x)|p�1/p

distp(A,B) = ∥fA − fB∥p

How do we measure the quality of WA approximations?

ℓp = {f : Σ⋆ → R | ∥f∥p < ∞}

Rat = {f : Σ⋆ → R | rational}

ℓpRat = Rat ∩ ℓp = {f : Σ⋆ → R | rational, ∥f∥p < ∞}

Important linearsubspaces of R��

(Pseudo-)Metrics for Approximation

�f�p =

��

x���

|f(x)|p�1/p

distp(A,B) = ∥fA − fB∥p

How do we measure the quality of WA approximations?

ℓp = {f : Σ⋆ → R | ∥f∥p < ∞}

Rat = {f : Σ⋆ → R | rational}

ℓpRat = Rat ∩ ℓp = {f : Σ⋆ → R | rational, ∥f∥p < ∞}

Important linearsubspaces of R��

Banach/Hilbert

not complete!

An Approximation Result

There is an algorithm that given a minimal WA A with nstates and k < n, runs in time O(|�|n6) and returns aWA A with k states such that

dist2(A, A) ��

�2k+1 + · · · + �2

n

[Balle, Panangaden, Precup 2015, Kulesza, Jiang, Singh 2015]

An Approximation Result

There is an algorithm that given a minimal WA A with nstates and k < n, runs in time O(|�|n6) and returns aWA A with k states such that

dist2(A, A) ��

�2k+1 + · · · + �2

n

singular values of theHankel matrix of fA

[Balle, Panangaden, Precup 2015, Kulesza, Jiang, Singh 2015]

The Plan

1. Hankel Matrices and Singular Value Automata (SVA)

2. Algorithms for Computing SVA

3. WA Approximation via SVA Truncation

The Hankel Matrix

Hf 2 R�?⇥�?

f : �? ! R

Hf(p, s) = f(ps)

»

——————————–

" a b ¨¨¨ s ¨¨¨

" fp"q fpaq fpbq ...

a fpaq fpaaq fpabq ...

b fpbq fpbaq fpbbq ......p ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ fppsq...

fi

����������fl

The Hankel Matrix

Hf 2 R�?⇥�?

f : �? ! R

Hf(p, s) = f(ps)

»

——————————–

" a b ¨¨¨ s ¨¨¨

" fp"q fpaq fpbq ...

a fpaq fpaaq fpabq ...

b fpbq fpbaq fpbbq ......p ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ fppsq...

fi

����������fl

The Hankel Matrix

[Kronecker 1881, Carlyle & Paz 1971, Fliess 1974]

Theorem: rank(Hf) = n iff f computed by minimal WA with n states

Hf 2 R�?⇥�?

f : �? ! R

Hf(p, s) = f(ps)

Hf(p, s) = f(ps) = αTp1 · · · TptTs1 · · · Tst′β = αp ·βs

Proof (upper bound rank(HfA) � |A|)

Structure of Low-rank Hankel Matrices

HfA R� � P R� n S Rn �

s

...

...

...p

...

p

s

fA p1 pT s1 sT �0 Ap1 ApT

↵A p

As1 AsT �

�A s

�A p P p, ⇥A s S , s

Hf(p, s) = f(ps) = αTp1 · · · TptTs1 · · · Tst′β = αp ·βs

Proof (upper bound rank(HfA) � |A|)

Structure of Low-rank Hankel Matrices

HfA R� � P R� n S Rn �

s

...

...

...p

...

p

s

fA p1 pT s1 sT �0 Ap1 ApT

↵A p

As1 AsT �

�A s

�A p P p, ⇥A s S , s

Hf = P · S

Hf(p, s) = f(ps) = αTp1 · · · TptTs1 · · · Tst′β = αp ·βs

Proof (upper bound rank(HfA) � |A|)

Proof (equality rank(Hf) = minfA=f |A|)1. Suppose Hf = P · S is rank factorisation

2. For a � � define Haf (p, s) = f(pas) and note

Haf = RaHf where (Raf)(x) = f(xa)

3. Observe that because S has full row rank colspan(RaP) =colspan(Ha

f ) � colspan(Hf) = colspan(P)

4. Hence there exists Ta such that RaP = PTa

5. Let A = �� = P(�, �), � = S(�, �), {Ta}�

6. Note fA = f since P(�, �)TxS(�, �) = RxP(�, �)S(�, �) =P(x, �)S(�, �) = Hf(x, �)

A Useful Correspondence

AQ = ��Q, Q�1�, {Q�1TaQ}a���

A = ��, �, {Ta}a��� Q � Rn�n invertible

A Useful Correspondence

(�Q)(Q�1Tx1Q) · · · (Q�1TxtQ)(Q�1�) = �Tx1 · · · Txt�

AQ = ��Q, Q�1�, {Q�1TaQ}a���

A = ��, �, {Ta}a��� Q � Rn�n invertible

A Useful Correspondence

(�Q)(Q�1Tx1Q) · · · (Q�1TxtQ)(Q�1�) = �Tx1 · · · Txt�

AQ = ��Q, Q�1�, {Q�1TaQ}a���

A = ��, �, {Ta}a��� Q � Rn�n invertible

Theorem:

bijection

For any rational f : �� � R

Minimal WA Acomputing f

Rank factorizationsof the form Hf = P · S

[Balle, Carreras, Luque, Quattoni 2014]

SVD of Hankel Matrices

Hf = UDV⊤ =

⎢⎢⎣

......

u1 · · · un

......

⎥⎥⎦

⎢⎣σ1

. . .σn

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎥⎦

SVD of Hankel Matrices

Hf = UDV⊤ =

⎢⎢⎣

......

u1 · · · un

......

⎥⎥⎦

⎢⎣σ1

. . .σn

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎥⎦

Theorem:

Hf with f rational admitsan SVD iff �f�2 < �

[Balle, Panangaden, Precup 2015, 201?]

SVD of Hankel Matrices

Hf = UDV⊤ =

⎢⎢⎣

......

u1 · · · un

......

⎥⎥⎦

⎢⎣σ1

. . .σn

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎥⎦

Hf : �2 � �2

g �� Hfg

(Hfg)(x) =�

y���

f(xy)g(y)

Key Idea: see Hankel matrix as an operator in a Hilbert spaceTheorem:

Hf with f rational admitsan SVD iff �f�2 < �

[Balle, Panangaden, Precup 2015, 201?]

The Singular Value AutomatonFrom Correspondence Theorem:

If Hf admits an SVD Hf = UDV�, then there exists a WAA for f inducing P = UD1/2 and S = D1/2V�

[Balle, Panangaden, Precup 2015]

The Singular Value AutomatonFrom Correspondence Theorem:

If Hf admits an SVD Hf = UDV�, then there exists a WAA for f inducing P = UD1/2 and S = D1/2V�

singular value automaton

[Balle, Panangaden, Precup 2015]

The Singular Value AutomatonFrom Correspondence Theorem:

If Hf admits an SVD Hf = UDV�, then there exists a WAA for f inducing P = UD1/2 and S = D1/2V�

WA computing sing. vect. can be obtained from SVA:

Ui = ⟨α, ei/√σi, {Ta}⟩

Vi = ⟨ei/√σi,β, {Ta}⟩

fUi= ui

fVi= vi

A = ��, �, {Ta}a���

singular value automaton

[Balle, Panangaden, Precup 2015]

Consequences of SVA

1. For every f � �2Rat the SVA provides a “unique”canonical WA representation

2. The vector subspace �2Rat � R��is closed under

taking left/right singular vectors of Hankel matrices

Consequences of SVA

1. For every f � �2Rat the SVA provides a “unique”canonical WA representation

2. The vector subspace �2Rat � R��is closed under

taking left/right singular vectors of Hankel matrices

The Plan

1. Hankel Matrices and Singular Value Automata (SVA)

2. Algorithms for Computing SVA

3. WA Approximation via SVA Truncation

Two Important Matrices

Reachability Gramian: GP = P> · P

Observability Gramian: GS = S · S>

Two Important Matrices

Reachability Gramian: GP = P> · P

Observability Gramian: GS = S · S>

Theorem:Assuming A is minimal, GP and GS are well-defined iff �fA�2 < �

[Balle, Panangaden, Precup 2015, 201?]

Two Important Matrices

Reachability Gramian: GP = P> · P

Observability Gramian: GS = S · S>

P = UD1/2 ) GP = D

S = D1/2V> ) GS = D

For SVA

Two Important Matrices

Reachability Gramian: GP = P> · P

Observability Gramian: GS = S · S>

GQP = Q>GPQ

GQS = Q-1GSQ

->

AQFor

P = UD1/2 ) GP = D

S = D1/2V> ) GS = D

For SVA

Computing the SVAGiven minimal WA A computing f � �2Rat do:

1. Compute Gramians GP and GS

2. Diagonalize M = GSGP

(i.e. find Q s.t. Q�1MQ = D2)

3. Obtain SVA as AQ

Computing the SVA

note:Step 2 is a finite eigenvalue problem :-)

Step 1 involves products of infinite matrices :-o

Given minimal WA A computing f � �2Rat do:

1. Compute Gramians GP and GS

2. Diagonalize M = GSGP

(i.e. find Q s.t. Q�1MQ = D2)

3. Obtain SVA as AQ

Computing the Gramians

Fixed-point Equations for Gramians

GS = β · β⊤ +∑

a∈Σ

Ta ·GS · T⊤a

GP = α⊤ · α+∑

a∈Σ

T⊤a ·GP · Ta

Computing the Gramians

Closed-form Algorithm

Total time O(n6 + |�|n4)vec(GS) =

I ��

a��

Ta � Ta

��1

(� � �)

Fixed-point Equations for Gramians

GS = β · β⊤ +∑

a∈Σ

Ta ·GS · T⊤a

GP = α⊤ · α+∑

a∈Σ

T⊤a ·GP · Ta

Computing the Gramians

Closed-form Algorithm

Total time O(n6 + |�|n4)vec(GS) =

I ��

a��

Ta � Ta

��1

(� � �)

Fixed-point Equations for Gramians

GS = β · β⊤ +∑

a∈Σ

Ta ·GS · T⊤a

GP = α⊤ · α+∑

a∈Σ

T⊤a ·GP · Ta

note: there is also an iterative algorithm with O(|�|n2.4) flops per iteration

The Plan

1. Hankel Matrices and Singular Value Automata (SVA)

2. Algorithms for Computing SVA

3. WA Approximation via SVA Truncation

Approximation (Attempt 1)⎡

⎢⎢⎣

......

u1 · · · un

......

⎥⎥⎦

⎢⎣σ1

. . .σn

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎥⎦ ≈

⎢⎢⎣

......

u1 · · · uk

......

⎥⎥⎦

⎢⎣σ1

. . .σk

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · vk

⊤ · · ·

⎥⎦

Hf H

Approximation (Attempt 1)⎡

⎢⎢⎣

......

u1 · · · un

......

⎥⎥⎦

⎢⎣σ1

. . .σn

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎥⎦ ≈

⎢⎢⎣

......

u1 · · · uk

......

⎥⎥⎦

⎢⎣σ1

. . .σk

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · vk

⊤ · · ·

⎥⎦

Hf H best rank k approximation

Claim:

If f : �� � R is such that Hf = H, then rank(f) = k

and �f � f�2 � �Hf � H�op = �k+1

Approximation (Attempt 1)⎡

⎢⎢⎣

......

u1 · · · un

......

⎥⎥⎦

⎢⎣σ1

. . .σn

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎥⎦ ≈

⎢⎢⎣

......

u1 · · · uk

......

⎥⎥⎦

⎢⎣σ1

. . .σk

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · vk

⊤ · · ·

⎥⎦

Hf H best rank k approximation

Claim:

If f : �� � R is such that Hf = H, then rank(f) = k

and �f � f�2 � �Hf � H�op = �k+1

Approximation (Attempt 1)⎡

⎢⎢⎣

......

u1 · · · un

......

⎥⎥⎦

⎢⎣σ1

. . .σn

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · v⊤n · · ·

⎥⎦ ≈

⎢⎢⎣

......

u1 · · · uk

......

⎥⎥⎦

⎢⎣σ1

. . .σk

⎥⎦

⎢⎣· · · v⊤1 · · ·

...· · · vk

⊤ · · ·

⎥⎦

Hf H best rank k approximation

not Hankel!

Approximation (Attempt 2)

Hf =

⎢⎢⎣

...f...

⎥⎥⎦ =

⎢⎢⎣

......√

σ1u1 · · · √σnun

......

⎥⎥⎦

⎢⎣β1...

βn

⎥⎦

UD1/2 D1/2V⊤

Approximation (Attempt 2)

f =n∑

i=1

βi√σiuiHf =

⎢⎢⎣

...f...

⎥⎥⎦ =

⎢⎢⎣

......√

σ1u1 · · · √σnun

......

⎥⎥⎦

⎢⎣β1...

βn

⎥⎦

UD1/2 D1/2V⊤

Claim:

Taking f =�k

i=1 �i�

�iui we have a rational functionsum of k orthogonal rational functions and �f � f�2

2 =�ni=k+1 �2

i�i � �ni=k+1 �2

i

Approximation (Attempt 2)

f =n∑

i=1

βi√σiuiHf =

⎢⎢⎣

...f...

⎥⎥⎦ =

⎢⎢⎣

......√

σ1u1 · · · √σnun

......

⎥⎥⎦

⎢⎣β1...

βn

⎥⎦

UD1/2 D1/2V⊤

Claim:

Taking f =�k

i=1 �i�

�iui we have a rational functionsum of k orthogonal rational functions and �f � f�2

2 =�ni=k+1 �2

i�i � �ni=k+1 �2

i

Approximation (Attempt 2)

f =n∑

i=1

βi√σiui

has rank n, but want k!

Hf =

⎢⎢⎣

...f...

⎥⎥⎦ =

⎢⎢⎣

......√

σ1u1 · · · √σnun

......

⎥⎥⎦

⎢⎣β1...

βn

⎥⎦

UD1/2 D1/2V⊤

SVA Truncation

q1 q2

q3 q4

Keep

Remove

SVA Truncation

q1 q2

q3 q4

Keep

Remove

corresponding to k largest singular values

SVA Truncation

q1 q2

q3 q4

Keep

Remove

corresponding to k largest singular values

A� =

2

664

• • • •• • • •• • • •• • • •

3

775

keep to keep

remove to removeremove to keep

keep to remove

Error in SVA TruncationOriginal Truncated

A� =

2

664

• • • •• • • •• • • •• • • •

3

775 A� =

�• •• •

�Tσ Tσ

A, f, n states A, f, k states

Error in SVA TruncationOriginal Truncated

A� =

2

664

• • • •• • • •• • • •• • • •

3

775 A� =

�• •• •

�Tσ Tσ

A, f, n states A, f, k states

Error Bounds:

�f � f�2 = �

��

Cf(�k+1 + · · · + �n)1/4

(�2k+1 + · · · + �2

n)1/2

[Balle, Panangaden, Precup 2015, 201?, Kulesza, Jiang, Singh 2015]

Conclusion

Conclusion

• Tools from functional analysis must play a role in a theory of automata approximation

Conclusion

• Tools from functional analysis must play a role in a theory of automata approximation

• Approximation theories can lead to useful algorithms for large-scale problems

Conclusion

• Tools from functional analysis must play a role in a theory of automata approximation

• Approximation theories can lead to useful algorithms for large-scale problems

• Change of basis in WA doesn’t change the function but yields automata with different properties (eg. for truncation)

Open Problems

Open Problems• How can this be extended to other types of automata, state

machines, co-algebras, etc?

Open Problems• How can this be extended to other types of automata, state

machines, co-algebras, etc?

• How important is it that WA are closed under reversal for the error analysis? [see Rabusseau, Balle, Cohen 2016]

Open Problems• How can this be extended to other types of automata, state

machines, co-algebras, etc?

• How important is it that WA are closed under reversal for the error analysis? [see Rabusseau, Balle, Cohen 2016]

• Can the AAK theorems in control theory be generalised to WA?

Open Problems• How can this be extended to other types of automata, state

machines, co-algebras, etc?

• How important is it that WA are closed under reversal for the error analysis? [see Rabusseau, Balle, Cohen 2016]

• Can the AAK theorems in control theory be generalised to WA?

• Can we obtain efficient approximation algorithms with extra constraints (eg. sparsity, positivity)?

(Co)-Algebraic and Analytical Aspects of Weighted Automata Minimisation and Equivalence

(Part 2)

Borja Balle

[CMCS Tutorial, Apr 2 2016]