Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

120
Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera

Transcript of Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Page 1: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 1

Active Learning

2010Colin de la Higuera

Zadar August 2010 2

Acknowledgements

Laurent Miclet Jose Oncina and Tim Oates for collaboration previous versions of these slides

Rafael Carrasco Paco Casacuberta Reacutemi Eyraud Philippe Ezequel Henning Fernau Thierry Murgue Franck Thollard Enrique Vidal Freacutedeacuteric Tantini

List is necessarily incomplete Excuses to those that have been forgotten

httppagespersolinauniv-nantesfr~cdlhslides

Book chapters 9 and13

Zadar August 2010 3

Outline

1 Motivations and applications2 The learning model3 Some negative results4 Algorithm L5 Some implementation issues6 Extensions7 Conclusion

Zadar August 2010 4

0 General idea

The learning algorithm (he) is allowed to interact with its environment through queries

The environment is formalised by an oracle (she)

Also called learning from queries or oracle learning

Zadar August 2010 5

1 Motivations

Zadar August 2010 6

Goals

define a credible learning modelmake use of additional information that

can be measured explain thus the difficulty of learning

certain classes solve real life problems

Zadar August 2010 7

Application robotics

A robot has to find a route in a maze The maze is represented as a graph The robot finds his way and can experiment The robot gets feedback

Dean T Basye K Kaelbling L Kokkevis E Maron O Angluin D Engelson S Inferring finite automata with stochastic output functions and an application to map learning In Swartout W ed Proceedings of the 10th National Conference on Artificial Intelligence San Jose CA Mit Press (1992) 208ndash214

Rivest RL Schapire RE Inference of finite automata using homing sequences Information and Computation 103 (1993) 299ndash347

Zadar August 2010 8

Application web wrapper induction

System SQUIRREL learns tree automataGoal is to learn a tree automaton which

when run on XML returns selected items

Carme J Gilleron R Lemay A Niehren J Interactive learning of node selecting tree transducer Machine Learning Journal 66(1) (2007) 33ndash67

Zadar August 2010 9

Applications under resourced languages

When a language does not have enough data for statistical methods to be of interest use of human expert for labelling

This is the case for most languages Examples Interactive predictive parsing Computer aided translation

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 2: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 2

Acknowledgements

Laurent Miclet Jose Oncina and Tim Oates for collaboration previous versions of these slides

Rafael Carrasco Paco Casacuberta Reacutemi Eyraud Philippe Ezequel Henning Fernau Thierry Murgue Franck Thollard Enrique Vidal Freacutedeacuteric Tantini

List is necessarily incomplete Excuses to those that have been forgotten

httppagespersolinauniv-nantesfr~cdlhslides

Book chapters 9 and13

Zadar August 2010 3

Outline

1 Motivations and applications2 The learning model3 Some negative results4 Algorithm L5 Some implementation issues6 Extensions7 Conclusion

Zadar August 2010 4

0 General idea

The learning algorithm (he) is allowed to interact with its environment through queries

The environment is formalised by an oracle (she)

Also called learning from queries or oracle learning

Zadar August 2010 5

1 Motivations

Zadar August 2010 6

Goals

define a credible learning modelmake use of additional information that

can be measured explain thus the difficulty of learning

certain classes solve real life problems

Zadar August 2010 7

Application robotics

A robot has to find a route in a maze The maze is represented as a graph The robot finds his way and can experiment The robot gets feedback

Dean T Basye K Kaelbling L Kokkevis E Maron O Angluin D Engelson S Inferring finite automata with stochastic output functions and an application to map learning In Swartout W ed Proceedings of the 10th National Conference on Artificial Intelligence San Jose CA Mit Press (1992) 208ndash214

Rivest RL Schapire RE Inference of finite automata using homing sequences Information and Computation 103 (1993) 299ndash347

Zadar August 2010 8

Application web wrapper induction

System SQUIRREL learns tree automataGoal is to learn a tree automaton which

when run on XML returns selected items

Carme J Gilleron R Lemay A Niehren J Interactive learning of node selecting tree transducer Machine Learning Journal 66(1) (2007) 33ndash67

Zadar August 2010 9

Applications under resourced languages

When a language does not have enough data for statistical methods to be of interest use of human expert for labelling

This is the case for most languages Examples Interactive predictive parsing Computer aided translation

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 3: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 3

Outline

1 Motivations and applications2 The learning model3 Some negative results4 Algorithm L5 Some implementation issues6 Extensions7 Conclusion

Zadar August 2010 4

0 General idea

The learning algorithm (he) is allowed to interact with its environment through queries

The environment is formalised by an oracle (she)

Also called learning from queries or oracle learning

Zadar August 2010 5

1 Motivations

Zadar August 2010 6

Goals

define a credible learning modelmake use of additional information that

can be measured explain thus the difficulty of learning

certain classes solve real life problems

Zadar August 2010 7

Application robotics

A robot has to find a route in a maze The maze is represented as a graph The robot finds his way and can experiment The robot gets feedback

Dean T Basye K Kaelbling L Kokkevis E Maron O Angluin D Engelson S Inferring finite automata with stochastic output functions and an application to map learning In Swartout W ed Proceedings of the 10th National Conference on Artificial Intelligence San Jose CA Mit Press (1992) 208ndash214

Rivest RL Schapire RE Inference of finite automata using homing sequences Information and Computation 103 (1993) 299ndash347

Zadar August 2010 8

Application web wrapper induction

System SQUIRREL learns tree automataGoal is to learn a tree automaton which

when run on XML returns selected items

Carme J Gilleron R Lemay A Niehren J Interactive learning of node selecting tree transducer Machine Learning Journal 66(1) (2007) 33ndash67

Zadar August 2010 9

Applications under resourced languages

When a language does not have enough data for statistical methods to be of interest use of human expert for labelling

This is the case for most languages Examples Interactive predictive parsing Computer aided translation

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 4: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 4

0 General idea

The learning algorithm (he) is allowed to interact with its environment through queries

The environment is formalised by an oracle (she)

Also called learning from queries or oracle learning

Zadar August 2010 5

1 Motivations

Zadar August 2010 6

Goals

define a credible learning modelmake use of additional information that

can be measured explain thus the difficulty of learning

certain classes solve real life problems

Zadar August 2010 7

Application robotics

A robot has to find a route in a maze The maze is represented as a graph The robot finds his way and can experiment The robot gets feedback

Dean T Basye K Kaelbling L Kokkevis E Maron O Angluin D Engelson S Inferring finite automata with stochastic output functions and an application to map learning In Swartout W ed Proceedings of the 10th National Conference on Artificial Intelligence San Jose CA Mit Press (1992) 208ndash214

Rivest RL Schapire RE Inference of finite automata using homing sequences Information and Computation 103 (1993) 299ndash347

Zadar August 2010 8

Application web wrapper induction

System SQUIRREL learns tree automataGoal is to learn a tree automaton which

when run on XML returns selected items

Carme J Gilleron R Lemay A Niehren J Interactive learning of node selecting tree transducer Machine Learning Journal 66(1) (2007) 33ndash67

Zadar August 2010 9

Applications under resourced languages

When a language does not have enough data for statistical methods to be of interest use of human expert for labelling

This is the case for most languages Examples Interactive predictive parsing Computer aided translation

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 5: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 5

1 Motivations

Zadar August 2010 6

Goals

define a credible learning modelmake use of additional information that

can be measured explain thus the difficulty of learning

certain classes solve real life problems

Zadar August 2010 7

Application robotics

A robot has to find a route in a maze The maze is represented as a graph The robot finds his way and can experiment The robot gets feedback

Dean T Basye K Kaelbling L Kokkevis E Maron O Angluin D Engelson S Inferring finite automata with stochastic output functions and an application to map learning In Swartout W ed Proceedings of the 10th National Conference on Artificial Intelligence San Jose CA Mit Press (1992) 208ndash214

Rivest RL Schapire RE Inference of finite automata using homing sequences Information and Computation 103 (1993) 299ndash347

Zadar August 2010 8

Application web wrapper induction

System SQUIRREL learns tree automataGoal is to learn a tree automaton which

when run on XML returns selected items

Carme J Gilleron R Lemay A Niehren J Interactive learning of node selecting tree transducer Machine Learning Journal 66(1) (2007) 33ndash67

Zadar August 2010 9

Applications under resourced languages

When a language does not have enough data for statistical methods to be of interest use of human expert for labelling

This is the case for most languages Examples Interactive predictive parsing Computer aided translation

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 6: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 6

Goals

define a credible learning modelmake use of additional information that

can be measured explain thus the difficulty of learning

certain classes solve real life problems

Zadar August 2010 7

Application robotics

A robot has to find a route in a maze The maze is represented as a graph The robot finds his way and can experiment The robot gets feedback

Dean T Basye K Kaelbling L Kokkevis E Maron O Angluin D Engelson S Inferring finite automata with stochastic output functions and an application to map learning In Swartout W ed Proceedings of the 10th National Conference on Artificial Intelligence San Jose CA Mit Press (1992) 208ndash214

Rivest RL Schapire RE Inference of finite automata using homing sequences Information and Computation 103 (1993) 299ndash347

Zadar August 2010 8

Application web wrapper induction

System SQUIRREL learns tree automataGoal is to learn a tree automaton which

when run on XML returns selected items

Carme J Gilleron R Lemay A Niehren J Interactive learning of node selecting tree transducer Machine Learning Journal 66(1) (2007) 33ndash67

Zadar August 2010 9

Applications under resourced languages

When a language does not have enough data for statistical methods to be of interest use of human expert for labelling

This is the case for most languages Examples Interactive predictive parsing Computer aided translation

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 7: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 7

Application robotics

A robot has to find a route in a maze The maze is represented as a graph The robot finds his way and can experiment The robot gets feedback

Dean T Basye K Kaelbling L Kokkevis E Maron O Angluin D Engelson S Inferring finite automata with stochastic output functions and an application to map learning In Swartout W ed Proceedings of the 10th National Conference on Artificial Intelligence San Jose CA Mit Press (1992) 208ndash214

Rivest RL Schapire RE Inference of finite automata using homing sequences Information and Computation 103 (1993) 299ndash347

Zadar August 2010 8

Application web wrapper induction

System SQUIRREL learns tree automataGoal is to learn a tree automaton which

when run on XML returns selected items

Carme J Gilleron R Lemay A Niehren J Interactive learning of node selecting tree transducer Machine Learning Journal 66(1) (2007) 33ndash67

Zadar August 2010 9

Applications under resourced languages

When a language does not have enough data for statistical methods to be of interest use of human expert for labelling

This is the case for most languages Examples Interactive predictive parsing Computer aided translation

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 8: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 8

Application web wrapper induction

System SQUIRREL learns tree automataGoal is to learn a tree automaton which

when run on XML returns selected items

Carme J Gilleron R Lemay A Niehren J Interactive learning of node selecting tree transducer Machine Learning Journal 66(1) (2007) 33ndash67

Zadar August 2010 9

Applications under resourced languages

When a language does not have enough data for statistical methods to be of interest use of human expert for labelling

This is the case for most languages Examples Interactive predictive parsing Computer aided translation

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 9: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 9

Applications under resourced languages

When a language does not have enough data for statistical methods to be of interest use of human expert for labelling

This is the case for most languages Examples Interactive predictive parsing Computer aided translation

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 10: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 10

Checking models

An electronic system can be modelled by a finite graph (a DFA)

Checking if a chip meets its specification can be done by testing or by trying to learn the specification with queries

Breacuteheacutelin L Gascuel O Caraux G Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits Pattern Analysis and Machine Intelligence 23(9) (2001) 997ndash1008

Berg T Grinchtein O Jonsson B Leucker M Raffelt H Steffen B On the correspondence between conformance testing and regular inference In Proceedings of Fundamental Approaches to Software Engineering 8th International Conference FASE 2005 Volume 3442 of Lncs Springer-Verlag (2005) 175ndash189

Raffelt H Steffen B Learnlib A library for automata learning and experimentation In Proceedings of Fase 2006 Volume 3922 of Lncs Springer-Verlag (2006) 377ndash380

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 11: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 11

Playing games

D Carmel and S Markovitch Model-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10(3)309ndash332 1998

D Carmel and S Markovitch Exploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2(2)141ndash172 1999

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 12: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 12

2 The model

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 13: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 13

Notations

We denote by T the target grammar or automaton

We denote by H the current hypothesis We denote by L(H) and L(T) the

corresponding languages Examples are x y z with labels lT(x) lT(y)

lT(z) for the real labels and lH(x) lH(y) lH(z) for the hypothesized ones

The computation of lI(x) must take place in time polynomial in x

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 14: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 14

Running example

Suppose we are learning DFA The running example target is

b b

aa

a b

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 15: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 15

The Oracle

knows the language and has to answer correctly

no probabilities unless statedworse case policy the Oracle does not

want to help

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 16: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 16

Some queries

1 sampling queries2 presentation queries3 membership queries4 equivalence queries (weak or strong)5 inclusion queries6 correction queries7 specific sampling queries8 translation queries9 probability queries

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 17: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 17

21 Sampling queries (Ex)

(w lT(w))

w is drawn following some unknown distribution

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 18: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 18

Sampling queries (Pos)

x

x is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 19: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 19

Sampling queries (Neg)

X

X is drawn following some unknown distribution restricted to L(T)

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 20: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 20

Example

Ex() might return (aabab1) or (0) Pos() might return ababNeg() might return aa

b b

aa

a b

Needs a distribution over L(T) or L(T)

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 21: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 21

22 Presentation queries

A presentation of a language is an enumeration of all the strings in with a label indicating if a string belongs or not to L(T) (informed presentation)

or an enumeration of all the strings in L(T) (text presentation)

There can be repetitions

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 22: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 22

Presentation queries

w=f(i)

f is a valid (unknown) presentationSub-cases can be text or informed presentations

iℕ

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 23: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 23

Example

Prestext(3) could be bba

Prestext(17) could be abbaba (the laquo selected raquo presentation being b

ab aab bba aaab abab abba bbabhellip)

b b

aa

a b

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 24: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 24

Example

Presinformed(3) could be (aab1)

Presinformed(1) could be (a0) (the laquo selected raquo presentation being

(b1)(a0)(aaa0)(aab1)(bba1)(a0)hellip)

b b

aa

a b

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 25: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 25

23 Membership queries

x x L(T)

L(T) is the target language

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 26: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 26

Example

MQ(aab) returns 1 (or true)MQ(bbb) returns 0 (or false)

b b

aa

a b

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 27: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 27

24 Equivalence (weak) queries

H Yes if L(T) = L(H)No if xxL(H)L(T)

AB is the symmetric difference

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 28: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 28

Equivalence (strong) queries

H Yes if TH x xL(H)L(T) if not

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 29: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 29

Example

EQ(H) returns abbb (or abbahellip)WEQ(H) returns false

b b

aa

a bb

a

a

H

b

T

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 30: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 30

25 Subset queries

H Yes if L(H) L(T) x xL(H) xL(T)

if not

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 31: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 31

Example

SSQ(H1) returns true

SSQ(H2) returns abbb

b b

aa

a b

b

a

a

H1

b

a

a H2

b

T

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 32: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 32

26 Correction queries

x Yes if x L(T) y L(T) y is a correction of xif not

Becerra-Bonache L de la Higuera C Janodet JC Tantini F Learning ballsof strings from edit corrections Journal of Machine Learning Research 9 (2008)1841ndash1870Kinber EB On learning regular expressions and patterns via membership andcorrection queries [33] 125ndash138

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 33: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 33

Example

CQsuff(bb) returns bba

CQedit(bb) returns any string in babbba

CQedit(bba) and CQsuff(bba) return true

b b

aa

a b

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 34: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 34

27 Specific sampling queries

Submit a grammar GOracle draws a string from L(G) and

labels it according to TRequires an unknown distribution

Allows for example to sample string starting with some specific prefix

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 35: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 35

28 Probability queries

Target is a PFA Submit w Oracle returns PrT(w)

4

13

12

1

2

1

2

13

2

a b

a

b

a

b

4

3

2

1

String ba should have a relative frequency of 116

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 36: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 36

29 Translation queries

Target is a transducer Submit a string Oracle returns its

translation

0 1b 00 b 0

a 1a 1

a 1 b

Tr(ab) returns 100Tr(bb) returns 0001

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 37: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 37

Learning setting

Two things have to be decided

The exact queries the learner is allowed to use

The conditions to be met to say that learning has been achieved

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 38: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 38

What queries are we allowed

The combination of queries is declared Examples Q=MQ Q=MQEQ (this is an MAT)

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 39: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 39

Defining learnablility

Can be in terms of classes of languages or in terms of classes of grammars

The size of a language is the size of the smallest grammar for that language

Important issue when does the learner stop asking questions

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 40: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 40

You canrsquot learn DFA with membership queries

Indeed suppose the target is a finite language

Membership queries just add strings to the language

But you canrsquot stop and be sure of success

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 41: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 41

Correct learning

A class C is learnable with queries from Q if there exists an algorithm a such thatLC a makes a finite number of queries from Q halts and returns a grammar G such that L(G)=L

We say that a learns C with queries from Q

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 42: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 42

Received information

Suppose that during a run the information received from the Oracle is stocked in a table Info()

Infon() is the information received from the first n queries

We denote by mInfon() (resp mInfo()) the size of the longest information received from the first n queries (resp during the entire run )

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 43: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 43

Polynomial update

A polynomial p() is givenAfter the nth query in any run the

runtime before the next query is in O(p(m)) where m = mInfon()

We say that a makes polynomial updates

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 44: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 44

Polynomial update

q q q q q

x1 x4x2 x3 x5

ltp(max(|x1| |x2| |x3| |x4|))

ltp(|T|)

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 45: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 45

Correct learning

A class C is learnable with a polynomial number of queries from Q if there exists an algorithm a and a polynomial q( ) such that

1) LC a learns L with queries from Q

2) a makes polynomial updates

3) a uses during a run at most q(m|L|) queries where m = mInfo()

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 46: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 46

Comment

In the previous definitions the queries receive deterministic answers

When the queries have a probabilistic nature things still have to be discussed as identification cannot be sure

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 47: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 47

PAC learning(Valiant 84 Pitt 89)

C a set of languagesG a set of grammars and m a maximal length over the strings n a maximal size of grammars

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 48: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 48

H is -AC (approximately correct)

ifPrD[lH(x)lT(x)]lt

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 49: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 49

L(T) L(H)

Errors we want PrD[lH(x)lT(x)]lt

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 50: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 50

3 Negative results

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 51: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 51

31 Learning from membership queries alone

Actually we can use subset queries and weak equivalence queries also without doing much better

Intuition keep in mind lock automata

0 1 1 0 1

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 52: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 52

Lemma (Angluin 88)

If a class C contains a set C and n

sets C1Cn such that i j[n] Ci Cj

= C any algorithm using

membership weak equivalence and

subset queries needs in the worse

case to make n-1 queries

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 53: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 53

C

WEQ(C)

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 54: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 54

Equivalence query on this Ci

WEQ(Cj)

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 55: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 55

Is C includedYES

(so what)

Sub(C)

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 56: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 56

Subset query on this Ci

Sub(Cj)

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 57: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 57

Does x belongYES(so what)

x

MQ(x)

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 58: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 58

Does x belongxNohellip

of course

MQ(x)

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 59: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 59

Proof (summarised)

Query Answer Action

WEQ(Ci) No eliminates Ci

SSQ(C) Yes eliminates nothing

SSQ(Ci) No eliminates nothing

MQ(x) ( C) Yes eliminates nothing

MQ(x) ( C ) Noeliminates Ci such that x Ci

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 60: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 60

Corollary

Let DFAn be the class of DFA with at most n statesDFAn cannot be identified by a polynomial number of membership weak equivalence and inclusion queries

L= Li=wi where wi is i written in base 2

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 61: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 61

32 What about equivalence queries

Negative results for Equivalence Queries D Angluin Machine Learning 5 121-150 1990

Equivalence queries measure also the number of implicit prediction errors a learning algorithm might make

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 62: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 62

The halving algorithm (1)

Suppose language consists of any subset of set of n strings w1hellipwn

There are 2n possible languages After k queries candidate languages

belong to set Ck

wi is a consensus string if it belongs to a majority of languages in Ck

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 63: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 63

The halving algorithm (2)

Then consider consensus language Hk=wi wi is a consensus string for Ck

Submit EQ(Hk)A counterexample is therefore not a

consensus string so less than half the languages in C will agree with the counterexample Therefore |Ck+1| |Ck|

Hence in n steps convergence is ensured

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 64: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 64

Why is this relevant

The halving algorithm can work when |C0|=2p(n)

This is always the case (because grammars are written with p(n) characters

That is why it is important that the equivalence queries are proper

Proper EQ(L) with L in C0

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 65: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 65

33 Learning from equivalence queries alone

Theorem (Angluin 88)DFA cannot be identified by a polynomial number of strong equivalence queries

(Polynomial in the size of the target)

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 66: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 66

Proof (approximate fingerprints)

n let Hn be a class of [size n] DFAA (perhaps not in Hn) and q() and n

sufficiently large wA wA ltq(n) such thatFHn wALA wALF lt Hn q(n)

Answering no to A and returning wA as a counter example only eliminates a sub-polynomial fraction of DFA

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 67: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 67

4 Algorithm L

Learning regular sets from queries and counter-examples D Angluin

Information and computation 75 87-106 1987

Queries and Concept learning D Angluin Machine Learning 2 319-342

1988

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 68: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 68

41 The Minimal Adequate Teacher

Learner is allowed strong equivalence queries membership queries

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 69: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 69

General idea of L

find a good table (representing a DFA) submit it as an equivalence query use counterexample to update the table submit membership queries to make the

table good iterate

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 70: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 70

42 An observation table

a

a

abaab

1 000

010001

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 71: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 71

The states (RED)

The transitions (BLUE)

The experiments (EXP)

a

a

abaab

1 000

010001

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 72: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 72

Meaning

(q0 )FL

a

a

abaab

1 000

010001

Must have identical label (redundancy)

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 73: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 73

(q0 aba) F

aba L

a

a

abaab

1 000

010001

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 74: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 74

Equivalent prefixes

These two rows are equal

hence

(q0)= (q0ab)

a

a

abaab

1 000

010001

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 75: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 75

Equivalent prefixes are states

a

a

abaab

1 000

010001

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 76: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 76

Building a DFA from a table

a

a

abaab

aa

1 000

010001

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 77: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 77

a

a

abaab

aa

b

a

b1 0

00

010001

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 78: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 78

a

a

abaab

1 000

010001

aa

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

REDRED=BLUE

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 79: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 79

An incomplete table

a

a

abaab

1 00

01001

aa

b

a

b

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 80: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 80

Good idea

We can complete the table by submitting membership queries

u

v

uvL

Membership query

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 81: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 81

A table is

closed if any row of BLUE corresponds to some row in RED

a

a

abaab

1 000

0110

01Not closed

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 82: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 82

And a table that is not closed

a

a

abaab

1 000

011001

aa

b

a

b

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 83: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 83

What do we do when we have a table that is not closed

Let s be the row (of BLUE) that does not appear in RED

Add s to RED and a add sa to BLUE

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 84: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 84

An inconsistent table

a

abaa

1 0ab

00000101

bbba 01

00

Are a and b equivalent

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 85: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 85

A table is consistent if

Every equivalent pair of rows in RED remains equivalent in RED BLUE after appending any symbol

OT[s1] =OT[s2]

a OT[s1a] =OT[s2a]

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 86: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 86

What do we do when we have an inconsistent table

Let a be such that OT[s1]=row(s2] but OT[s1a] OT[s2a]

If OT[s1a] OT[s2a] it is so for experiment e

Then add experiment ae to the table

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 87: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 87

What do we do when we have a closed and consistent table

We build the corresponding DFA

We make an equivalence query

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 88: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 88

What do we do if we get a counter-example

Let u be this counter-example

wPref(u) doadd w to REDa such that waRED add wa

to BLUE

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 89: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 89

43 Run of the algorithm

a

b

1

1

1 Table is now closed

and consistent

b

a

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 90: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 90

An equivalence query is made

b

a

Counter example baa is returned

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 91: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 91

a

b1

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

Because of

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 92: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 92

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

Table is now closed and consistent

ba

baa

a

b

a

b b

a

0

0 0

1 1

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 93: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 93

The algorithm

while not A dowhile OT is not complete consistent or

closed doif OT is not complete then make MQif OT is not consistent then add

experimentif OT is not closed then promote

AEQ(OT)

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 94: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 94

44 Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 95: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 95

Termination Correctness

For every regular language there is a unique minimal DFA that recognizes it

Given a closed and consistent table one can generate a consistent DFA

A DFA consistent with a table has at least as many states as different rows in H

If the algorithm has built a table with n different rows in H then it is the target

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 96: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 96

Finiteness

Each closure failure adds one different row to RED

Each inconsistency failure adds one experiment which also creates a new row in RED

Each counterexample adds one different row to RED

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 97: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 97

Polynomial

|EXP| n at most n-1 equivalence queries |membership queries| n(n-1)m where

m is the length of the longest counter-example returned by the oracle

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 98: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 98

Conclusion

With an MAT you can learn DFA but also a variety of other classes of

grammars it is difficult to see how powerful is really an

MAT probably as much as PAC learning Easy to find a class a set of queries and

provide and algorithm that learns with them more difficult for it to be meaningful

Discussion why are these queries meaningful

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 99: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 99

Discussion

Are membership and equivalence queries realistic

Membership queries are plausible in a number of applications

Equivalence queries are notA way around this it to do sampling

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 100: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 100

Good idea

If we sample following D 100 strings and we coincide in labelling with the Oracle then how bad are we

Formula suppose the error is more than e then coinciding 100 times has probability at least (1- )100 The chance this happens is less than 06 for =5

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 101: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 101

About PAC learning and

equivalence queries

To be convinced that equivalence queries

can exist replace them by the following

test

draw m random examples x1hellipxm

if i lT(xi)=lH(xi) then the error is most likely

small

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 102: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 102

How small

Let us suppose that the true error is more than

Then the probability of selecting randomly one

example where T and H coincide is at most 1-

the probability of selecting randomly m examples where T and H coincide (all the time) is at most (1-)m

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 103: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 103

And

(1-)m e-m

So by making this we have e-m

log -m log(1) m m 1 log(1)

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 104: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 104

Conclusion

If we can draw according to the true distribution one can learn a approximately correct DFA from membership queries only

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 105: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 105

5 Implementation issues

How to implement the tableAbout Zulu

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 106: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 106

Instead of memorising a table of 0s and 1s data structure should be a table of laquoknowledgeraquo

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 107: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 107

Use pointers

a

a

b1 1

1

0 0

baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

0

0 0

1 1

a

a

b

baaba

baaa

bbbab

baab

1a 1b 1aa 0ba 1bb 1baa 0bab 1baaa 0baab 0baaaa 0baaba 0

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 108: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 108

Zulu competition

httplabh-curienuniv-st-etiennefrzulu 23 competing algorithms 11 players End of the competition a week ago Tasks Learn a DFA be as precise as possible

with n queries

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 109: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 109

Results

Task

queries

alphabet

Best

states

Task

queries

alphabet

states Best

1 304 3 10000

8 13 725 15 10 10000

2 199 3 10000

16 14 1365 15 17 10000

3 1197 3 9650 81 15 5266 15 60 10000

4 1384 3 9322 100 16 7570 15 71 10000

5 1971 3 8589 151 17 17034 15 147 10000

6 3625 3 10000

176 18 16914 15 143 8794

7 429 5 10000

15 19 1970 5 93 8167

8 375 5 10000

18 20 1329 5 61 7000

9 2524 5 9644 84 21 571 5 40 6922

10 3021 5 10000

90 22 735 5 57 6511

11 5428 5 9994 153 23 483 5 73 8661

12 4616 5 10000

123 24 632 5 78 10000

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 110: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 110

6 Further challenges

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 111: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 111

Transducer learning

bbb

c

a bb

b b

a a

a a

a ca

b c

abaabacbbbbb

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 112: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 112

Transducer learning

101

1 10

00

1 1

0 1

1 0 00

Multiplication by 3

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 113: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 113

Typical queries

Translation queries Tr(w) Oracle answers with the

translation of w

Vilar JM Query learning of subsequential transducers In Miclet L de la Higuera C eds Proceedings of ICGI rsquo96 Number 1147 in LNAI Springer-Verlag (1996) 72ndash83

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 114: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 114

PFA learning

Probabilistic finite automata can be Deterministic Non deterministic

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 115: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 115

4

1

3

1

2

1

2

1

2

13

2

a

b

a

b

a

b

4

3

2

1

PrA(abab)=24

1

4

3

3

2

3

1

2

1

2

1

A deterministic PFA

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 116: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 116

Pr(aba) = 0704011 +070404502 = 0028+00252=00532

02

01

1

a

b

a

a a

b

045

03504

07

03

01

b 04

A nondeterministic PFA

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 117: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 117

What queries should we consider

Probability queries PQ(w) Oracle returns PrD(w)

EX() Oracle returns a string w randomly drawn according to D

Specific sampling query SSQ(L) Oracle returns a string belonging to L

sampled according to D Distribution is PrD L(w)=PrD(w)PrD(L)

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 118: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 118

Context-free grammar learning

Typical query corresponds to using the grammar (structural query)

In which case the goal is to identify the grammar not the language

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 119: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 119

7 General conclusion

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120
Page 120: Zadar, August 2010 1 Active Learning 2010 Colin de la Higuera.

Zadar August 2010 120

Some open problems

Find better definitionsDo these definitions give a general

framework for grammatical inferenceHow can we use resource bounded

queriesUse Zulu or develop new tools for Zulu

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • Slide 68
  • Slide 69
  • Slide 70
  • Slide 71
  • Slide 72
  • Slide 73
  • Slide 74
  • Slide 75
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Slide 80
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Slide 85
  • Slide 86
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Slide 94
  • Slide 95
  • Slide 96
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Slide 105
  • Slide 106
  • Slide 107
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Slide 112
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Slide 117
  • Slide 118
  • Slide 119
  • Slide 120