Attempts to extend correction queries

Attempts to extend correction Attempts to extend correction queriesqueries

Cristina BibireCristina BibireResearch Group on Mathematical Linguistics, Rovira i Research Group on Mathematical Linguistics, Rovira i

Virgili UniversityVirgili University Pl. Imperial Tarraco 1, 43005, Tarragona, SpainPl. Imperial Tarraco 1, 43005, Tarragona, Spain

E-mail: [email protected]: [email protected]

3131stst of October 2005 of October 2005Seminar IVSeminar IV

Correction queries

PAC learning of DFA

Learning CFL

Learning WFA

Redefining the correcting string

References

Learning from corrections Learning from corrections The correcting string of s in the language L is the smallest string s' (in lex-length order) such that s.s' belongs to L.

The answer to a correction query for a string consists of its correcting string.

Myhill-Nerode theorem:

The number of states in the smallest DFA accepting L is equal to the number of equivalence classes in .L

*if iffLx y z xz L yz L

Learning from corrections Learning from corrections

PAC learning of DFA with CQ

Learning CFL with CQ

Learning WFA with CQ


How can we extend CQ?

PAC learning of DFA with PAC learning of DFA with CQCQWe assume that there is some probability distribution Pr on the set

of all strings over the alphabet Σ and let L be an unknown regular set

The Learner has access to information about L by means of two oracles:

• C(x) returns the correcting string for x

• Ex( ) is a random sampling oracle that selects a string x from Σ* according to the distribution Pr and returns the pair (x, C(x)).

In addition, the Learner is given the accuracy ε and the confidence δ.

Definition: We say that the language L1 is an ε-approximation of the language L2 provided that:

If A is a DFA, it is said to be an ε-approximation of the set L if L(A) is an ε-approximation of L.

1 2

Prx L L

x

PAC learning of DFA with PAC learning of DFA with CQCQIf A is an ε-approximation of L, then the probability of finding a

discrepancy between L(A) and L with one call of the random sampling oracle Ex( ) is at most ε.

The approximate learner LCAapprox is obtained by modifying LCA. A correction query of the string x is satisfied by a call to C(x). Each conjecture is tested by a number of calls to Ex( ).

• If any of the calls to Ex( ) returns a pair (t, C(t)) such that:

- C(t)=λ but A(S,E,C) rejects it or

- C(t)≠λ but A(S,E,C) accepts it

then t is said to be a counterexample and LCAapprox proceeds as LCA

• If none of the calls to Ex( ) returns a counterexample, then LCAapprox halts and outputs A(S,E,C)

PAC learning of DFA with PAC learning of DFA with CQCQHow many calls to Ex( ) does LCAapprox make to test a given

conjecture?

• accuracy and confidence parameters, ε and δ

• how many previous conjectures have been tested

Let

If i previous conjectures have been tested then LCAapprox makes [ri] calls to Ex( ).

Theorem. If n is the number of states in the minimum DFA for the target language L, then LCAapprox terminates after O(n+(1/ε) (ln(1/δ)n+n2)) calls to Ex( ) oracle. Moreover, the probability that the automaton output by LCAapprox is an ε-approximation of L is at least 1-δ.

1 1ln 1 ln 2ir i

PAC learning of DFA with PAC learning of DFA with CQCQSketch of the proof:

• the total number of counterexamples is at most n-1, so the total number of calls to Ex( ) is at most

• the probability that LCAapprox will terminate with an automaton that is not an ε-approximation of L is:

2

0

1n

ii

r

21lnO n n n

2

0

1 i

nr

i

2 2

10 0 2

i

n nr

ii i

e

Learning CFLLearning CFLThe setting

There is an unknown CFG G in Chomsky normal form. The Learner knows the set T of terminal symbols, the set N of nonterminal symbols and the start symbol S of G. The Teacher is assumed to answer two types of questions:

• MEMBER(x,A) – if the string x can be derived from the non-terminal A in the grammar G, the answer is yes; otherwise, it is no

• EQUIV(H) – if H is equivalent to G, the answer is yes; otherwise, it replies with a counterexample t.

Learning CFLLearning CFLThe Learner LCF

LCF can explicitly enumerate all the possible productions of G in polynomial time (in |T| and |N|). Initially LCF places all possible productions of G in the hypothesized set of productions P.

The main loop of LCF asks an EQUIV(H) question for the grammar H=(T,N,S,P).

• if H is equivalent to G, then LCF halts and outputs H

• otherwise, it “diagnoses” the counterexample t returned, which results in removing at least one production from P; the main loop is then repeated.

Learning WFA Learning WFA

Let be a field and be a function. Associate with an

infinite matrix with rows indexed by strings in and columns

indexed by strings in . The entry of contains the value

f(x.y). The function is called a power series and its Hankel

matrix.

If we have an WFA A we can associate a function and vice

versa, for every function there exists a smallest WFA A such that

.

Theorem [Carlyle, Paz 1971] Let such that and

let F be the corresponding Hankel matrix. Then, the size r of the

smallest WFA A such that satisfies r=rank(F).

*:f K K fF x *

y * ,x y Ff F

fAf

Af f*:f K 0f

Af f

Learning WFA Learning WFA Let f be a target function. The learning algorithm may ask the oracle two types of query:

• EQ(h): if h is equivalent to f on all input assignments then the answer to the query is yes; otherwise, the answer is no and it receives a counterexample z ( ).

• MQ(z): the oracle has to return f(z)

The algorithm learns a function f using its Hankel matrix, F. Because of the mentioned theorem, it is enough to keep a sub-matrix of F of full rank. Therefore the learning algorithm can be viewed as a search for appropriate r rows and r columns.

f z h z

r r

Learning WFA Learning WFA The algorithm

(1) Initialize:

(2) Define a hypothesis h

Let

For every , define a matrix such that

For every , define

(3) Ask an equivalence query EQ(h)

• If the answer is yes, halt and output h

• Otherwise, the answer is no and we receive a counterexample z

Using MQ find a string w.σ, prefix of z such that

(a)

(b)

Go to (2)

1 1 1 1, , , ,and 1x y X x Y y l

1 ,..., lf x f x

ˆ . ,1

ˆ ˆˆi j

l

x xi jj

F y F

*w

1 2ˆ ˆ ˆ ˆ

kw

1

ˆh w w

1,

1

ˆ ˆˆi

l

w xii

F w F

. .1,

1

ˆ ˆˆs.t.i

l

w xii

y Y F y w F y

1 1 1 1, . , , ,and 1l l l lx w y y X X x Y Y y l l




Redefining the CQ


Redefining the correcting Redefining the correcting stringstring• Hamming distance (only for strings of the same length). For two

strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.

• , s.t. and ( , ) is minimumC s s s L H s s




q0

1

q1

0

q3q2

0

0

0

1 1 1

λλ StatesStates

λλ λλ qq00

00 φφ qq11

0000 0000 qq22

11 φφ qq11

0101 0000 qq22

000000 φφ qq11

001001 φφ qq11

S S

S




q0

1

q1

0

q3q2

0

0

0

1 1 1

λλ StatesStates

λλ λλ qq00

00 φφ qq11

0000 0000 qq22

11 φφ qq11

0101 0000 qq22

000000 φφ qq11

001001 φφ qq11

S S

S

q0

q1

0, 1 q2

0, 1

0, 1



• min , ' 'C s H s s s L




q0

1

q1

0

q3q2

0

0

0

1 1 1

λλ StatesStates

λλ 00 qq00

00 ∞∞ qq11

0101 11 qq22

11 ∞∞ qq11

0000 00 qq00

010010 ∞∞ qq11

011011 ∞∞ qq11

S S

S




q0

1

q1

0

q3q2

0

0

0

1 1 1

λλ StatesStates

λλ 00 qq00

00 ∞∞ qq11

0101 11 qq22

11 ∞∞ qq11

0000 00 qq00

010010 ∞∞ qq11

011011 ∞∞ qq11

S S

S

q0

q1

0, 1 q2

1

0, 1

0

Redefining the correcting Redefining the correcting stringstring• Hamming distance

min , ' 'C s H s s s L

q0

1

q1

0

q3q2

0

0

0

1 1 1

S S

S

λλ 00

λλ 00 ∞∞

00 ∞∞ 00

11 ∞∞ 11

0101 11 ∞∞

1010 11 ∞∞

0000 00 ∞∞

1111 00 ∞∞

010010 ∞∞ 11

011011 ∞∞ 00

100100 ∞∞ 11

101101 ∞∞ 00

StatesStates

qq00

qq11

qq22

qq33

qq33

qq00

qq00

qq22

qq11

qq22

qq11

Redefining the correcting Redefining the correcting stringstring• Hamming distance

min , ' 'C s H s s s L

S S

Sq0

1

q10 q2

0

110

λλ 00 StatesStates

λλ ∞∞ ∞∞ qq00

00 ∞∞ 00 qq11

0000 00 11 qq22

0101 11 00 qq33

11 ∞∞ 11 qq44

000000 11 22 qq55

001001 00 11 qq22

010010 00 11 qq22

011011 11 00 qq22

Redefining the correcting Redefining the correcting stringstring• Levenshtein (or edit) distance. It counts also when one has a

character whereas the other does not.

For two characters a and b, define:

Assume we are given two strings s and t of length n and m, respectively. We are going to fill an (n+1)×(m+1) array d with integers such that the low right corner element d(n+1, m+1) will furnish the required values of the Levenshtein distance Lev(s, t).

The definition of entries of d is recursive.

First set and

For other pairs i, j use

0,,

1,

a br a b

a b

, min , ,1, , 1 1 1, 1 ,1 d i j dd i j i j r s i td ji j

,0 , 0,d i i i n 0, , 0,d j j j m

Redefining the correcting Redefining the correcting stringstring• Levenshtein distance

min , ' 'C s Lev s s s L

S S

S

q0

1

q10 q2

0

110

λλ StatesStates

λλ 22 qq00

00 11 qq11

0000 00 qq22

11 22 qq00

0101 11 qq11

000000 11 qq11

001001 00 qq22



S S

S

q0

1

q10 q2

0

110

λλ StatesStates

λλ 22 qq00

00 11 qq11

0000 00 qq22

11 22 qq00

0101 11 qq11

000000 11 qq11

001001 00 qq22

1

q0

1

q10 q2

0

10



S S

Sq0

1

q10 q2

0

110

λλ 00

λλ 22 11

00 11 00

0000 00 11

000000 11 11

00000000 11 00

11 22 11

0101 11 00

001001 00 11

00010001 11 11

0000000000 00 11

0000100001 11 00

StatesStates

qq00

qq11

qq22

qq33

qq11

qq00

qq11

qq22

qq33

qq22

qq11

ReferencesReferences D. Agluin. Learning Regular Sets from Queries and Counter-examples. Information and Computation 75, 87-106 (1987)

L. Lee. Learning of Context-Free Languages: A Survey of the Literature. Harvard University Technical Report TR-12-1996 (written in 1994)

C. de la Higuera. Learning Stochastic Finite Automata from Experts. In Proceedings of the 4th International Colloquium on Grammatical Inference, Lecture Notes In Computer Science 1433, 79-89 (1998)

F. Bergadano, N. Bshouty, A. Beimel, E. Kushilevitz and S. Varricchio. Learning Functions Represented as Multiplicity Automata. Journal of the ACM 47, 506-530 (2000)

http://www.cut-the-knot.org/do_you_know/Strings.shtml

Attempts to extend correction queries

Documents

Transcript of Attempts to extend correction queries