Structured Kernels for Natural Lanuguage Processing
Transcript of Structured Kernels for Natural Lanuguage Processing
![Page 1: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/1.jpg)
Fabio Massimo Zanzotto
ART Group - Department of Computer Science, Systems and Production
University of Rome “Tor Vergata”
Italy
Structured Kernels
for Natural Lanuguage Processing
![Page 2: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/2.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
• What is a kernel functions in kernel machines
• Linguistic interpretations can be represented as
graphs
• Generally, the most used class of graphs in NLP
are trees.
What you should already know
![Page 3: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/3.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
• A kernel KT(t1,t2) function for comparing trees
that determines a particular feature space
• A specific NLP classification task:
Textual Entailment Recognition
• The first-order rule (FOR) feature space and one
related kernel function K(G1,G2) for sentence pair
classification
What we will see
![Page 4: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/4.jpg)
Kernel functions
for tree fragment feature spaces
![Page 5: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/5.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
• Named Entity Recognition: assign a class (e.g.,
Person, Organization) to target words or word
sequences in a particular text
Given the training examples (among the others):
• John/Person delivers a talk
• Bob/Person delivers a tutorial
assign a class to Maggie in:
• Maggie delivers a speech
A sample task: Named Entity Recognition
![Page 6: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/6.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Idea 1: the context is relevant
• John/Person delivers a talk
• Bob/Person delivers a tutorial
• Maggie delivers a speech
Idea 2: syntactic roles of all the words are relevant
• We want to describe and write a kernel of the
syntactic tree fragment feature space
A sample task: Named Entity Recognition
![Page 7: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/7.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
A syntactic tree fragment feature spaceThe SubSet Tree (SST) space of Collins and Duffy, (2002)
NP
D N
VP
V
delivers
a talk
NP
D N
VP
V
delivers
a
NP
D N
VP
V
delivers
NP
D N
VP
V NP
VP
V
![Page 8: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/8.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
A syntactic tree fragment feature spaceThe overall SST fragment set
NP
D N
a talk
NP
D N
NP
D N
a
D N
a talk
NP
D N NP
D N
VP
V
delivers
a talk
V
delivers
NP
D N
VP
V
a talk
NP
D N
VP
V
NP
D N
VP
V
a
NP
D
VP
V
talk
N
a
NP
D N
VP
V
delivers
talk
NP
D N
VP
V
delivers NP
D N
VP
V
delivers
NP
VP
V NP
VP
V
delivers
talk
![Page 9: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/9.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
A syntactic tree fragment feature spaceExplicit kernel space
zx
..,0)..,0,..,1, .,1,.,1,..,0,. ..,0,..,0,..,1, ..,1,..,1,..,0, 0,(x
counts the number of common substructures
NP
D N
a talk
NP
D N
a
NP
D N NP
D N
VP
V
delivers
a talk
NP
D N
VP
V
a talk
NP
D N
VP
V
talk
![Page 10: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/10.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Given the function T(t) that computes the subtrees of
a given tree t
The kernel may be written as:
KT(t1,t2)=| T(t1)T(t2) |
The kernel KT of the SST feature space
![Page 11: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/11.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Given a tree t, how many subtrees has it? Let us focus on the subtrees originated by the root
A step back…
R
D(ch1) D(ch2) D(chn)…
D(R)=i (1+ D(chi)) se R non è terminale
D(R)=0 se R è terminale
![Page 12: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/12.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
KT:Kernels for Structural RepresentationImplicit Representation
[Collins and Duffy, ACL 2002] evaluate D in O(n2):
DD
D
D
)(
1
))),(),,((1(),(
else terminals-pre are if ,1),(
elsedifferent are sproduction theif ,0),(
xnnc
j
zxzx
zxzx
zx
jnchjnchnn
nnnn
nn
D
zzxx Tn
zx
Tn
zxTzx
nn
TTKTTzx
),(
),()()(
![Page 13: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/13.jpg)
Textual Entailment Recognition
![Page 14: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/14.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Why textual entailment recognition is useful?
Information Retrieval (IR)
Question Answering (QA)
Multi Document Summarization (MDS)
Information Extraction (IE)
(from Dagan, Roth, Zanzotto, 2007)
![Page 15: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/15.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Why textual entailment recognition is useful?
Overture’s acquisition by Yahoo Yahoo bought Overture
Question Expected answer form
Who bought Overture? >> X bought Overture
• Similar for IE: X buy Y
• Similar for “semantic” IR: t: Overture was bought …
• Summarization (multi-document) – identify redundant info
text hypothesized answer
(from Dagan, Roth, Zanzotto, 2007)
![Page 16: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/16.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Textual entailment recognition as
classification
T1
H1
“At the end of the year, all solid companies pay dividends.”
“At the end of the year, all solid insurance companies pay dividends.”
T1 H1
… the textual entailment recognition task:
determine whether or not a text T implies an hypothesis H
Target problem: learning textual entailment
recognition rules from annotated examples
![Page 17: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/17.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Recognizing Textual Entailment is a classification
task:
Given a pair we need to decide if T implies H or T does
not implies H
We can learn a classifier from annotated examples
What do we need:
• A learning algorithm
• A suitable feature space
Page 17
Intial hypothesis
![Page 18: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/18.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
First-order rule (FOR) feature space
T2
H2
“Wanadoo bought KStone”
“Wanadoo owns KStone”
T2 H2
T3
H3
“Google bought Microsoft”
“Microsoft owns Google”
T3 H3
T3
H3
“Fiat bought Chrysler”
“Fiat owns Chrysler”
T3 H3
Training examples
Application
Relevant
Features
![Page 19: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/19.jpg)
Kernels for First-Order Rule (FOR)
Feature Spaces
![Page 20: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/20.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
We want to write the kernel
K(P1,P2)
that computes how many common first-order rules
are activated
• For this formulation, we want to write syntactic-
first-ordere rules
Kernels in FOR spaces
![Page 21: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/21.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
• We need:
– A representation of the sentence pairs: G
– A function that generates subgraphs:
• We can then define the kernel function as:
Kernels in FOR spaces
![Page 22: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/22.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Sentence pair representationS
PP NP VP
DT JJ NNS
NP
DT NN
NPIN
PP
IN NP
DT NN
yearthe
ofendthe
At all solid companies
VBP NP
pay NNS
dividends
NN
insurance
S
PP NP VP
DT JJ NNS
NP
DT NN
NPIN
PP
IN NP
DT NN
yearthe
ofendthe
At all solid companies
VBP NP
pay NNS
dividends
0
1
2’ 2’’ 3
4
0
0
0
0
1
1
1
2 2
2
3
3
3
4
4
0
1
2’ 2’’ 3
4
0
0
0
0
1
1
1
2 2
2
3
3
3
4
4
T
H
![Page 23: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/23.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Sentence pair representation
and subgraph function
![Page 24: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/24.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Kernels in FOR spaces: critical problems
![Page 25: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/25.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Kernels in FOR spaces: critical problems
![Page 26: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/26.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
• FOR feature spaces can be modelled with
particular graphs
• We call these graphs tripartite direct acyclic
graphs (tDAGs)
• Observations:
– tDAGs are not trees
– tDAGs can be used to model both rules and sentence
pairs
A step back…
![Page 27: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/27.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
As in feature structures…
Rules and sencence pairs can be seen as graphs
Unifying a rule and a sentence pair is a graph
matching problem
Tripartite Directed Acyclic Graphs (tDAGs)
![Page 28: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/28.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
A tripartite directed acyclic graph is a graph G =
(N,E) where:
• the set of nodes N is partitioned in three sets Nt,
Ng, and A
• the set of edges is partitioned in four sets Et, Eg,
EAt , and EAg
where
t = (Nt,Et) and g = (Ng,Eg) are two trees
EAt = {(x, y)|x Nt and yA}
EAg = {(x, y)|x Ng and yA}
Tripartite Directed Acyclic Graphs (tDAGs)
![Page 29: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/29.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Alternative definition
A tripartite directed acyclic graph is a pair of
extented trees G = (t,g) where:
t = (NtA,EtEAt) and
g = (NgA,EgEAg ).
Tripartite Directed Acyclic Graphs (tDAGs)
![Page 30: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/30.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Kernels in FOR spaces: critical problems
![Page 31: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/31.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
• Isomorphism between graphs
• G1=(N1,E1) and G2=(N2,E2) are isomorphic if:
– |N1|=|N2| and |E1|=|E2|
– Among all the bijecive functions relating N1 and N2, it
exists f : N1 N2
– such that:
• for each n1 in N1, Label(n1)=Label(f(n1))
• for each (na,nb) in E1, (f(na),f(nb)) is in E2
Isomorphism between tDAGs
![Page 32: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/32.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Isomorphism adapted to tDAGs
G1 = (t1,g1) and G2 = (t2,g2) are isomorphic if
these two properties hold
– Partial isomorphism
• g1 and g2 are isomorphic
• t1 and t2 are isomorphic
• This property generates two functions fg and ft
– Constraint compatibility
• fg and ft are compatible on the sets of nodes A1 and A2, if for
each n A1, it happens that f g (n) = ft (n).
Isomorphism between tDAGs
![Page 33: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/33.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Isomorphism between tDAGs
Constraint compatibility
Partial Isomophism
![Page 34: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/34.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
• Making explicit the isomorphism in the kernel:
It is better expressed in this way:
Ideas for building the kernel
![Page 35: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/35.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
• Firstly using the constraint compatibility
Building a set C of all the relevant alternative
constraints
• Then, detecting the partial isomorphism between
tDAGs
Ideas for building the kernel
![Page 36: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/36.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Ideas for building the kernel
Possible alternative constraints
![Page 37: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/37.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
![Page 38: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/38.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Ideas for building the kernel
![Page 39: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/39.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
General Equation
We compute it using:
1) KT introduced at the beginning
2) The inclusion exclusion principle
Kernel on FOR feature spaces
![Page 40: Structured Kernels for Natural Lanuguage Processing](https://reader035.fdocuments.us/reader035/viewer/2022071601/613d5086736caf36b75bdcc2/html5/thumbnails/40.jpg)
F.M.Zanzotto
University of Rome “Tor Vergata”
Tree kernels
• M. Collins, N. Duffy, Convolution Kernels for Natural Language,
NIPS 2001
Tripartite directed acyclic graph kernels
• F.M. Zanzotto, L. Dell’Arciprete, Efficient kernels for sentence pair
classification, EmNLP 2009
Textual Entailment Recognition
• I. Dagan, D. Roth, F.M. Zanzotto, Tutorial on Recognizing Textual
Entailment, ACL 2007
References