Attempts to extend correction queries
description
Transcript of Attempts to extend correction queries
Attempts to extend correction Attempts to extend correction queriesqueries
Cristina BibireCristina BibireResearch Group on Mathematical Linguistics, Rovira i Research Group on Mathematical Linguistics, Rovira i
Virgili UniversityVirgili University Pl. Imperial Tarraco 1, 43005, Tarragona, SpainPl. Imperial Tarraco 1, 43005, Tarragona, Spain
E-mail: [email protected]: [email protected]
3131stst of October 2005 of October 2005Seminar IVSeminar IV
Correction queries
PAC learning of DFA
Learning CFL
Learning WFA
Redefining the correcting string
References
Learning from corrections Learning from corrections The correcting string of s in the language L is the smallest string s' (in lex-length order) such that s.s' belongs to L.
The answer to a correction query for a string consists of its correcting string.
Myhill-Nerode theorem:
The number of states in the smallest DFA accepting L is equal to the number of equivalence classes in .L
*if iffLx y z xz L yz L
Learning from corrections Learning from corrections
PAC learning of DFA with CQ
Learning CFL with CQ
Learning WFA with CQ
Redefining the correcting string
How can we extend CQ?
PAC learning of DFA with PAC learning of DFA with CQCQWe assume that there is some probability distribution Pr on the set
of all strings over the alphabet Σ and let L be an unknown regular set
The Learner has access to information about L by means of two oracles:
• C(x) returns the correcting string for x
• Ex( ) is a random sampling oracle that selects a string x from Σ* according to the distribution Pr and returns the pair (x, C(x)).
In addition, the Learner is given the accuracy ε and the confidence δ.
Definition: We say that the language L1 is an ε-approximation of the language L2 provided that:
If A is a DFA, it is said to be an ε-approximation of the set L if L(A) is an ε-approximation of L.
1 2
Prx L L
x
PAC learning of DFA with PAC learning of DFA with CQCQIf A is an ε-approximation of L, then the probability of finding a
discrepancy between L(A) and L with one call of the random sampling oracle Ex( ) is at most ε.
The approximate learner LCAapprox is obtained by modifying LCA. A correction query of the string x is satisfied by a call to C(x). Each conjecture is tested by a number of calls to Ex( ).
• If any of the calls to Ex( ) returns a pair (t, C(t)) such that:
- C(t)=λ but A(S,E,C) rejects it or
- C(t)≠λ but A(S,E,C) accepts it
then t is said to be a counterexample and LCAapprox proceeds as LCA
• If none of the calls to Ex( ) returns a counterexample, then LCAapprox halts and outputs A(S,E,C)
PAC learning of DFA with PAC learning of DFA with CQCQHow many calls to Ex( ) does LCAapprox make to test a given
conjecture?
• accuracy and confidence parameters, ε and δ
• how many previous conjectures have been tested
Let
If i previous conjectures have been tested then LCAapprox makes [ri] calls to Ex( ).
Theorem. If n is the number of states in the minimum DFA for the target language L, then LCAapprox terminates after O(n+(1/ε) (ln(1/δ)n+n2)) calls to Ex( ) oracle. Moreover, the probability that the automaton output by LCAapprox is an ε-approximation of L is at least 1-δ.
1 1ln 1 ln 2ir i
PAC learning of DFA with PAC learning of DFA with CQCQSketch of the proof:
• the total number of counterexamples is at most n-1, so the total number of calls to Ex( ) is at most
• the probability that LCAapprox will terminate with an automaton that is not an ε-approximation of L is:
2
0
1n
ii
r
21lnO n n n
2
0
1 i
nr
i
2 2
10 0 2
i
n nr
ii i
e
PAC learning of DFA with CQ
Learning CFL with CQ
Learning WFA with CQ
Redefining the correcting string
How can we extend CQ?
Learning CFLLearning CFLThe setting
There is an unknown CFG G in Chomsky normal form. The Learner knows the set T of terminal symbols, the set N of nonterminal symbols and the start symbol S of G. The Teacher is assumed to answer two types of questions:
• MEMBER(x,A) – if the string x can be derived from the non-terminal A in the grammar G, the answer is yes; otherwise, it is no
• EQUIV(H) – if H is equivalent to G, the answer is yes; otherwise, it replies with a counterexample t.
Learning CFLLearning CFLThe Learner LCF
LCF can explicitly enumerate all the possible productions of G in polynomial time (in |T| and |N|). Initially LCF places all possible productions of G in the hypothesized set of productions P.
The main loop of LCF asks an EQUIV(H) question for the grammar H=(T,N,S,P).
• if H is equivalent to G, then LCF halts and outputs H
• otherwise, it “diagnoses” the counterexample t returned, which results in removing at least one production from P; the main loop is then repeated.
PAC learning of DFA with CQ
Learning CFL with CQ
Learning WFA with CQ
Redefining the correcting string
How can we extend CQ?
Learning WFA Learning WFA
Let be a field and be a function. Associate with an
infinite matrix with rows indexed by strings in and columns
indexed by strings in . The entry of contains the value
f(x.y). The function is called a power series and its Hankel
matrix.
If we have an WFA A we can associate a function and vice
versa, for every function there exists a smallest WFA A such that
.
Theorem [Carlyle, Paz 1971] Let such that and
let F be the corresponding Hankel matrix. Then, the size r of the
smallest WFA A such that satisfies r=rank(F).
*:f K K fF x *
y * ,x y Ff F
fAf
Af f*:f K 0f
Af f
Learning WFA Learning WFA Let f be a target function. The learning algorithm may ask the oracle two types of query:
• EQ(h): if h is equivalent to f on all input assignments then the answer to the query is yes; otherwise, the answer is no and it receives a counterexample z ( ).
• MQ(z): the oracle has to return f(z)
The algorithm learns a function f using its Hankel matrix, F. Because of the mentioned theorem, it is enough to keep a sub-matrix of F of full rank. Therefore the learning algorithm can be viewed as a search for appropriate r rows and r columns.
f z h z
r r
Learning WFA Learning WFA The algorithm
(1) Initialize:
(2) Define a hypothesis h
Let
For every , define a matrix such that
For every , define
(3) Ask an equivalence query EQ(h)
• If the answer is yes, halt and output h
• Otherwise, the answer is no and we receive a counterexample z
Using MQ find a string w.σ, prefix of z such that
(a)
(b)
Go to (2)
1 1 1 1, , , ,and 1x y X x Y y l
1 ,..., lf x f x
ˆ . ,1
ˆ ˆˆi j
l
x xi jj
F y F
*w
1 2ˆ ˆ ˆ ˆ
kw
1
ˆh w w
1,
1
ˆ ˆˆi
l
w xii
F w F
. .1,
1
ˆ ˆˆs.t.i
l
w xii
y Y F y w F y
1 1 1 1, . , , ,and 1l l l lx w y y X X x Y Y y l l
PAC learning of DFA with CQ
Learning CFL with CQ
Learning WFA with CQ
Redefining the CQ
How can we extend CQ?
Redefining the correcting Redefining the correcting stringstring• Hamming distance (only for strings of the same length). For two
strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
• , s.t. and ( , ) is minimumC s s s L H s s
Redefining the correcting Redefining the correcting stringstring• Hamming distance (only for strings of the same length). For two
strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
• , s.t. and ( , ) is minimumC s s s L H s s
q0
1
q1
0
q3q2
0
0
0
1 1 1
λλ StatesStates
λλ λλ qq00
00 φφ qq11
0000 0000 qq22
11 φφ qq11
0101 0000 qq22
000000 φφ qq11
001001 φφ qq11
S S
S
Redefining the correcting Redefining the correcting stringstring• Hamming distance (only for strings of the same length). For two
strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
• , s.t. and ( , ) is minimumC s s s L H s s
q0
1
q1
0
q3q2
0
0
0
1 1 1
λλ StatesStates
λλ λλ qq00
00 φφ qq11
0000 0000 qq22
11 φφ qq11
0101 0000 qq22
000000 φφ qq11
001001 φφ qq11
S S
S
q0
q1
0, 1 q2
0, 1
0, 1
Redefining the correcting Redefining the correcting stringstring• Hamming distance (only for strings of the same length). For two
strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
• min , ' 'C s H s s s L
Redefining the correcting Redefining the correcting stringstring• Hamming distance (only for strings of the same length). For two
strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
• min , ' 'C s H s s s L
q0
1
q1
0
q3q2
0
0
0
1 1 1
λλ StatesStates
λλ 00 qq00
00 ∞∞ qq11
0101 11 qq22
11 ∞∞ qq11
0000 00 qq00
010010 ∞∞ qq11
011011 ∞∞ qq11
S S
S
Redefining the correcting Redefining the correcting stringstring• Hamming distance (only for strings of the same length). For two
strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
• min , ' 'C s H s s s L
q0
1
q1
0
q3q2
0
0
0
1 1 1
λλ StatesStates
λλ 00 qq00
00 ∞∞ qq11
0101 11 qq22
11 ∞∞ qq11
0000 00 qq00
010010 ∞∞ qq11
011011 ∞∞ qq11
S S
S
q0
q1
0, 1 q2
1
0, 1
0
Redefining the correcting Redefining the correcting stringstring• Hamming distance
min , ' 'C s H s s s L
q0
1
q1
0
q3q2
0
0
0
1 1 1
S S
S
λλ 00
λλ 00 ∞∞
00 ∞∞ 00
11 ∞∞ 11
0101 11 ∞∞
1010 11 ∞∞
0000 00 ∞∞
1111 00 ∞∞
010010 ∞∞ 11
011011 ∞∞ 00
100100 ∞∞ 11
101101 ∞∞ 00
StatesStates
qq00
qq11
qq22
qq33
qq33
qq00
qq00
qq22
qq11
qq22
qq11
Redefining the correcting Redefining the correcting stringstring• Hamming distance
min , ' 'C s H s s s L
S S
Sq0
1
q10 q2
0
110
λλ 00 StatesStates
λλ ∞∞ ∞∞ qq00
00 ∞∞ 00 qq11
0000 00 11 qq22
0101 11 00 qq33
11 ∞∞ 11 qq44
000000 11 22 qq55
001001 00 11 qq22
010010 00 11 qq22
011011 11 00 qq22
Redefining the correcting Redefining the correcting stringstring• Hamming distance
min , ' 'C s H s s s L
S S
Sq0
1
q10 q2
0
110
λλ 00 StatesStates
λλ ∞∞ ∞∞ qq00
00 ∞∞ 00 qq11
0000 00 11 qq22
0101 11 00 qq33
11 ∞∞ 11 qq44
000000 11 22 qq55
001001 00 11 qq22
010010 00 11 qq22
011011 11 00 qq22
Redefining the correcting Redefining the correcting stringstring• Hamming distance
min , ' 'C s H s s s L
S S
Sq0
1
q10 q2
0
110
λλ 00 StatesStates
λλ ∞∞ ∞∞ qq00
00 ∞∞ 00 qq11
0000 00 11 qq22
0101 11 00 qq33
11 ∞∞ 11 qq44
000000 11 22 qq55
001001 00 11 qq22
010010 00 11 qq22
011011 11 00 qq22
Redefining the correcting Redefining the correcting stringstring• Levenshtein (or edit) distance. It counts also when one has a
character whereas the other does not.
For two characters a and b, define:
Assume we are given two strings s and t of length n and m, respectively. We are going to fill an (n+1)×(m+1) array d with integers such that the low right corner element d(n+1, m+1) will furnish the required values of the Levenshtein distance Lev(s, t).
The definition of entries of d is recursive.
First set and
For other pairs i, j use
0,,
1,
a br a b
a b
, min , ,1, , 1 1 1, 1 ,1 d i j dd i j i j r s i td ji j
,0 , 0,d i i i n 0, , 0,d j j j m
Redefining the correcting Redefining the correcting stringstring• Levenshtein distance
min , ' 'C s Lev s s s L
S S
S
q0
1
q10 q2
0
110
λλ StatesStates
λλ 22 qq00
00 11 qq11
0000 00 qq22
11 22 qq00
0101 11 qq11
000000 11 qq11
001001 00 qq22
Redefining the correcting Redefining the correcting stringstring• Levenshtein distance
min , ' 'C s Lev s s s L
S S
S
q0
1
q10 q2
0
110
λλ StatesStates
λλ 22 qq00
00 11 qq11
0000 00 qq22
11 22 qq00
0101 11 qq11
000000 11 qq11
001001 00 qq22
1
q0
1
q10 q2
0
10
Redefining the correcting Redefining the correcting stringstring• Levenshtein distance
min , ' 'C s Lev s s s L
S S
Sq0
1
q10 q2
0
110
λλ 00
λλ 22 11
00 11 00
0000 00 11
000000 11 11
00000000 11 00
11 22 11
0101 11 00
001001 00 11
00010001 11 11
0000000000 00 11
0000100001 11 00
StatesStates
qq00
qq11
qq22
qq33
qq11
qq00
qq11
qq22
qq33
qq22
qq11
PAC learning of DFA with CQ
Learning CFL with CQ
Learning WFA with CQ
Redefining the correcting string
How can we extend CQ?
ReferencesReferences D. Agluin. Learning Regular Sets from Queries and Counter-examples. Information and Computation 75, 87-106 (1987)
L. Lee. Learning of Context-Free Languages: A Survey of the Literature. Harvard University Technical Report TR-12-1996 (written in 1994)
C. de la Higuera. Learning Stochastic Finite Automata from Experts. In Proceedings of the 4th International Colloquium on Grammatical Inference, Lecture Notes In Computer Science 1433, 79-89 (1998)
F. Bergadano, N. Bshouty, A. Beimel, E. Kushilevitz and S. Varricchio. Learning Functions Represented as Multiplicity Automata. Journal of the ACM 47, 506-530 (2000)
http://www.cut-the-knot.org/do_you_know/Strings.shtml