Structured Kernels for Natural Lanuguage Processing

40
Fabio Massimo Zanzotto ART Group - Department of Computer Science, Systems and Production University of Rome Tor Vergata” Italy Structured Kernels for Natural Lanuguage Processing

Transcript of Structured Kernels for Natural Lanuguage Processing

Page 1: Structured Kernels for Natural Lanuguage Processing

Fabio Massimo Zanzotto

ART Group - Department of Computer Science, Systems and Production

University of Rome “Tor Vergata”

Italy

Structured Kernels

for Natural Lanuguage Processing

Page 2: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

• What is a kernel functions in kernel machines

• Linguistic interpretations can be represented as

graphs

• Generally, the most used class of graphs in NLP

are trees.

What you should already know

Page 3: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

• A kernel KT(t1,t2) function for comparing trees

that determines a particular feature space

• A specific NLP classification task:

Textual Entailment Recognition

• The first-order rule (FOR) feature space and one

related kernel function K(G1,G2) for sentence pair

classification

What we will see

Page 4: Structured Kernels for Natural Lanuguage Processing

Kernel functions

for tree fragment feature spaces

Page 5: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

• Named Entity Recognition: assign a class (e.g.,

Person, Organization) to target words or word

sequences in a particular text

Given the training examples (among the others):

• John/Person delivers a talk

• Bob/Person delivers a tutorial

assign a class to Maggie in:

• Maggie delivers a speech

A sample task: Named Entity Recognition

Page 6: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Idea 1: the context is relevant

• John/Person delivers a talk

• Bob/Person delivers a tutorial

• Maggie delivers a speech

Idea 2: syntactic roles of all the words are relevant

• We want to describe and write a kernel of the

syntactic tree fragment feature space

A sample task: Named Entity Recognition

Page 7: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

A syntactic tree fragment feature spaceThe SubSet Tree (SST) space of Collins and Duffy, (2002)

NP

D N

VP

V

delivers

a talk

NP

D N

VP

V

delivers

a

NP

D N

VP

V

delivers

NP

D N

VP

V NP

VP

V

Page 8: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

A syntactic tree fragment feature spaceThe overall SST fragment set

NP

D N

a talk

NP

D N

NP

D N

a

D N

a talk

NP

D N NP

D N

VP

V

delivers

a talk

V

delivers

NP

D N

VP

V

a talk

NP

D N

VP

V

NP

D N

VP

V

a

NP

D

VP

V

talk

N

a

NP

D N

VP

V

delivers

talk

NP

D N

VP

V

delivers NP

D N

VP

V

delivers

NP

VP

V NP

VP

V

delivers

talk

Page 9: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

A syntactic tree fragment feature spaceExplicit kernel space

zx

..,0)..,0,..,1, .,1,.,1,..,0,. ..,0,..,0,..,1, ..,1,..,1,..,0, 0,(x

counts the number of common substructures

NP

D N

a talk

NP

D N

a

NP

D N NP

D N

VP

V

delivers

a talk

NP

D N

VP

V

a talk

NP

D N

VP

V

talk

Page 10: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Given the function T(t) that computes the subtrees of

a given tree t

The kernel may be written as:

KT(t1,t2)=| T(t1)T(t2) |

The kernel KT of the SST feature space

Page 11: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Given a tree t, how many subtrees has it? Let us focus on the subtrees originated by the root

A step back…

R

D(ch1) D(ch2) D(chn)…

D(R)=i (1+ D(chi)) se R non è terminale

D(R)=0 se R è terminale

Page 12: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

KT:Kernels for Structural RepresentationImplicit Representation

[Collins and Duffy, ACL 2002] evaluate D in O(n2):

DD

D

D

)(

1

))),(),,((1(),(

else terminals-pre are if ,1),(

elsedifferent are sproduction theif ,0),(

xnnc

j

zxzx

zxzx

zx

jnchjnchnn

nnnn

nn

D

zzxx Tn

zx

Tn

zxTzx

nn

TTKTTzx

),(

),()()(

Page 13: Structured Kernels for Natural Lanuguage Processing

Textual Entailment Recognition

Page 14: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Why textual entailment recognition is useful?

Information Retrieval (IR)

Question Answering (QA)

Multi Document Summarization (MDS)

Information Extraction (IE)

(from Dagan, Roth, Zanzotto, 2007)

Page 15: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Why textual entailment recognition is useful?

Overture’s acquisition by Yahoo Yahoo bought Overture

Question Expected answer form

Who bought Overture? >> X bought Overture

• Similar for IE: X buy Y

• Similar for “semantic” IR: t: Overture was bought …

• Summarization (multi-document) – identify redundant info

text hypothesized answer

(from Dagan, Roth, Zanzotto, 2007)

Page 16: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Textual entailment recognition as

classification

T1

H1

“At the end of the year, all solid companies pay dividends.”

“At the end of the year, all solid insurance companies pay dividends.”

T1 H1

… the textual entailment recognition task:

determine whether or not a text T implies an hypothesis H

Target problem: learning textual entailment

recognition rules from annotated examples

Page 17: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Recognizing Textual Entailment is a classification

task:

Given a pair we need to decide if T implies H or T does

not implies H

We can learn a classifier from annotated examples

What do we need:

• A learning algorithm

• A suitable feature space

Page 17

Intial hypothesis

Page 18: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

First-order rule (FOR) feature space

T2

H2

“Wanadoo bought KStone”

“Wanadoo owns KStone”

T2 H2

T3

H3

“Google bought Microsoft”

“Microsoft owns Google”

T3 H3

T3

H3

“Fiat bought Chrysler”

“Fiat owns Chrysler”

T3 H3

Training examples

Application

Relevant

Features

Page 19: Structured Kernels for Natural Lanuguage Processing

Kernels for First-Order Rule (FOR)

Feature Spaces

Page 20: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

We want to write the kernel

K(P1,P2)

that computes how many common first-order rules

are activated

• For this formulation, we want to write syntactic-

first-ordere rules

Kernels in FOR spaces

Page 21: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

• We need:

– A representation of the sentence pairs: G

– A function that generates subgraphs:

• We can then define the kernel function as:

Kernels in FOR spaces

Page 22: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Sentence pair representationS

PP NP VP

DT JJ NNS

NP

DT NN

NPIN

PP

IN NP

DT NN

yearthe

ofendthe

At all solid companies

VBP NP

pay NNS

dividends

NN

insurance

S

PP NP VP

DT JJ NNS

NP

DT NN

NPIN

PP

IN NP

DT NN

yearthe

ofendthe

At all solid companies

VBP NP

pay NNS

dividends

0

1

2’ 2’’ 3

4

0

0

0

0

1

1

1

2 2

2

3

3

3

4

4

0

1

2’ 2’’ 3

4

0

0

0

0

1

1

1

2 2

2

3

3

3

4

4

T

H

Page 23: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Sentence pair representation

and subgraph function

Page 24: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Kernels in FOR spaces: critical problems

Page 25: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Kernels in FOR spaces: critical problems

Page 26: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

• FOR feature spaces can be modelled with

particular graphs

• We call these graphs tripartite direct acyclic

graphs (tDAGs)

• Observations:

– tDAGs are not trees

– tDAGs can be used to model both rules and sentence

pairs

A step back…

Page 27: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

As in feature structures…

Rules and sencence pairs can be seen as graphs

Unifying a rule and a sentence pair is a graph

matching problem

Tripartite Directed Acyclic Graphs (tDAGs)

Page 28: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

A tripartite directed acyclic graph is a graph G =

(N,E) where:

• the set of nodes N is partitioned in three sets Nt,

Ng, and A

• the set of edges is partitioned in four sets Et, Eg,

EAt , and EAg

where

t = (Nt,Et) and g = (Ng,Eg) are two trees

EAt = {(x, y)|x Nt and yA}

EAg = {(x, y)|x Ng and yA}

Tripartite Directed Acyclic Graphs (tDAGs)

Page 29: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Alternative definition

A tripartite directed acyclic graph is a pair of

extented trees G = (t,g) where:

t = (NtA,EtEAt) and

g = (NgA,EgEAg ).

Tripartite Directed Acyclic Graphs (tDAGs)

Page 30: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Kernels in FOR spaces: critical problems

Page 31: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

• Isomorphism between graphs

• G1=(N1,E1) and G2=(N2,E2) are isomorphic if:

– |N1|=|N2| and |E1|=|E2|

– Among all the bijecive functions relating N1 and N2, it

exists f : N1 N2

– such that:

• for each n1 in N1, Label(n1)=Label(f(n1))

• for each (na,nb) in E1, (f(na),f(nb)) is in E2

Isomorphism between tDAGs

Page 32: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Isomorphism adapted to tDAGs

G1 = (t1,g1) and G2 = (t2,g2) are isomorphic if

these two properties hold

– Partial isomorphism

• g1 and g2 are isomorphic

• t1 and t2 are isomorphic

• This property generates two functions fg and ft

– Constraint compatibility

• fg and ft are compatible on the sets of nodes A1 and A2, if for

each n A1, it happens that f g (n) = ft (n).

Isomorphism between tDAGs

Page 33: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Isomorphism between tDAGs

Constraint compatibility

Partial Isomophism

Page 34: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

• Making explicit the isomorphism in the kernel:

It is better expressed in this way:

Ideas for building the kernel

Page 35: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

• Firstly using the constraint compatibility

Building a set C of all the relevant alternative

constraints

• Then, detecting the partial isomorphism between

tDAGs

Ideas for building the kernel

Page 36: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

Possible alternative constraints

Page 37: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Page 38: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

Page 39: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

General Equation

We compute it using:

1) KT introduced at the beginning

2) The inclusion exclusion principle

Kernel on FOR feature spaces

Page 40: Structured Kernels for Natural Lanuguage Processing

F.M.Zanzotto

University of Rome “Tor Vergata”

Tree kernels

• M. Collins, N. Duffy, Convolution Kernels for Natural Language,

NIPS 2001

Tripartite directed acyclic graph kernels

• F.M. Zanzotto, L. Dell’Arciprete, Efficient kernels for sentence pair

classification, EmNLP 2009

Textual Entailment Recognition

• I. Dagan, D. Roth, F.M. Zanzotto, Tutorial on Recognizing Textual

Entailment, ACL 2007

References