Joint Learning of Syntactic and Semantic Dependencies · Introduction Difficulties Joint model...

Post on 22-May-2020

4 views 0 download

Transcript of Joint Learning of Syntactic and Semantic Dependencies · Introduction Difficulties Joint model...

Introduction Difficulties Joint model Results and discussion Future work

Joint Learning of Syntactic and SemanticDependencies

Xavier Lluıs and Lluıs Marquez

Barcelona, 10 de setembre de 2008

Introduction Difficulties Joint model Results and discussion Future work

Introduction

The CoNLL-2008 shared task

Syntactic dependency parsing

Predicate identification and disambiguation

Semantic role labeling

Introduction Difficulties Joint model Results and discussion Future work

1 Introduction

2 Difficulties

3 Joint model

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Talk Overview

1 IntroductionSyntactic and semantic parsingDesign a joint model

2 Difficulties

3 Joint model

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

Talk Overview

1 IntroductionSyntactic and semantic parsingDesign a joint model

2 Difficulties

3 Joint model

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

syntactic and semantic parsing

A sample sentence

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

syntactic and semantic parsing

Syntactic dependencies

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

syntactic and semantic parsing: semantics

Predicate completed

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

syntactic and semantic parsing: semantics

Semantic dependencies for completed

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

syntactic and semantic parsing: semantics

Predicate acquisition

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

syntactic and semantic parsing: semantics

Semantic dependencies for acquisition

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

syntactic and semantic parsing: semantics

Predicate announcedq

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

syntactic and semantic parsing: semantics

Semantic dependencies for announced

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

syntactic and semantic parsing: semantics

Semantic dependencies for all predicates

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

Mainstream approach

The pipeline approach

1 Syntactic parsing

A parser (Eisner, Shift-reduce)

2 Semantic role labeling

A simpler (non-structured) classifier

⇒ ⇒

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic parsing

Pipeline strategy

The pipeline approach

1 Propagation or amplification of errors

2 Assumes an order of increasing difficulty

3 Dependencies between layers are hard to be captured

Introduction Difficulties Joint model Results and discussion Future work

The joint approach

Joint approach

Design a joint model

One single step for syntax + semantics

Musillo and Merlo (2006) used extended tags (e.g.,“NMODA0”)

Henderson et. al (2008) extended shift-reduce parsers

Introduction Difficulties Joint model Results and discussion Future work

The joint approach

Joint approach

Design a joint model

1 Can a joint system overcome the pipeline approach?

2 Is it feasible to build a joint parsing system?

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

Talk Overview

1 IntroductionSyntactic and semantic parsingDesign a joint model

2 Difficulties

3 Joint model

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

Design a joint model

A joint approach

Extend a syntactic parsing model to jointly parse semantics

1 Syntactic parsing

A parser (Eisner, Shift-reduce)

2 Semantic role labeling

A simpler (non-structured) classifier

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

Design a joint model

A joint approach

Extend the Eisner algorithm jointly parse semantics

O(n3) algorithm

Based on CKY algorithm

Bottom-up parser

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Bottom-up dependency parsing

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Bottom-up dependency parsing

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Bottom-up dependency parsing

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Bottom-up dependency parsing

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Bottom-up dependency parsing

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Bottom-up dependency parsing

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Bottom-up dependency parsing

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Score of a dependency

A dependency d = 〈h,m, l〉 of a sentence x is scored by:

score(d , x) = φ (〈h,m, l〉 , x) ·w

where φ is a feature extraction function,w is a weight vector

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Score of a tree

A syntactic tree y for a sentence x is scored by:

score tree(y , x) =∑

〈h,m,l〉∈y

score (〈h,m, l〉 , y)

Arc-factorization

The sum of independent scores for each dependency of the tree iscalled the first order factorization.

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

Best tree

We are interested in the best scoring tree among all trees Y(x):

best tree(x) = argmaxy∈Y(x)

score tree(y , x)

Eisner algorithm

The Eisner algorithm is an exact search algorithm that computesthe best first-order factorized tree.

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

The Eisner algorithm

A non-projective dependency graph

Introduction Difficulties Joint model Results and discussion Future work

Design a joint model

Joint Approach Difficulties

Joint parsing point of view

Joint parsing implies the prediction of the semantic labelsimultaneously to the syntactic label

Introduction Difficulties Joint model Results and discussion Future work

Talk Overview

1 Introduction

2 DifficultiesSyntactic and semantic overlapUnavailable features

3 Joint model

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Talk Overview

1 Introduction

2 DifficultiesSyntactic and semantic overlapUnavailable features

3 Joint model

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Joint Approach Difficulties: an example

Syntactic and semantic overlap

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Joint Approach Difficulties: an example

Syntactic and semantic overlap

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Joint Approach Difficulties: an example

Syntactic and semantic overlap

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Joint Approach Difficulties: an example

Syntactic and semantic overlap

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Syntax and Semantics overlapping

1. Are syntax and semantics overlapping?

36.4% of argument-predicate relations do not exactly overlapwith modifier-head syntactic relations.

Proposed solution

Bound the semantic label to the syntactic dependency

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Difficulties: non-overlapping semantics

Solution to overlapping dependencies

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Difficulties: non-overlapping semantics

Solution to overlapping dependencies

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Difficulties: non-overlapping semantics

Solution to overlapping dependencies

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Difficulties: non-overlapping semantics

Solution to overlapping dependencies

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Difficulties: non-overlapping semantics

Solution to overlapping dependencies

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Difficulties: non-overlapping semantics

Solution to overlapping dependencies

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Difficulties: non-overlapping semantics

Solution

An extended dependency is:

d =⟨

h,m, lsyn, lsem p1 , . . . , lsem pq

h is the headm the modifierlsyn the syntactic labellsem pi

one semantic label for each sentence predicate pi

Introduction Difficulties Joint model Results and discussion Future work

Syntactic and semantic overlap

Proposed solution

OBJ, _, _, Su

OBJ, A1, A1, _

AMOD, _, AM−TMP, _

NMOD, _, _, _

NMOD, _, _, _SBJ, A0, _, A0

A dependency has its syntactic and semantic labels

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Talk Overview

1 Introduction

2 DifficultiesSyntactic and semantic overlapUnavailable features

3 Joint model

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Proposed solution

OBJ, _, _, Su

OBJ, A1, A1, _

AMOD, _, AM−TMP, _

NMOD, _, _, _

NMOD, _, _, _SBJ, A0, _, A0

Unavailable features for distant arguments

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Proposed solution

Unavailable features for distant arguments

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Proposed solution

Unavailable features for distant arguments

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Proposed solution

Unavailable features for distant arguments

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Proposed solution

Unavailable features for distant arguments

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Proposed solution

Unavailable features for distant arguments

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Proposed solution

Unavailable features for distant arguments

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Proposed solution

Unavailable features for distant arguments

Introduction Difficulties Joint model Results and discussion Future work

Unavailable features

Problems inherited from traditional pipeline design

2. Problems not appearing in pipeline systems

State-of-the-art SRL systems strongly rely on syntactic pathfeatures.

There is only a partial visibility of the syntax restricted to thecurrent sentence span.

A distant argument-predicate relation can occur.

Proposed solution

Pre-parse and extract predicate-modifier syntactic paths.

Introduction Difficulties Joint model Results and discussion Future work

Talk Overview

1 Introduction

2 Difficulties

3 Joint modelFormalizationLearning

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Formalization

Talk Overview

1 Introduction

2 Difficulties

3 Joint modelFormalizationLearning

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Joint Model

Joint Model

The Joint Model extends and it is based on the first ordersyntactic model

Best joint tree

best tree(x ,w, y ′) = argmaxy∈Y(x)

score tree(y , x ,w, y ′)

argmax computed using the Eisner algorithm

x is the input sentencey is the syntactic-semantic treey ′ pre-parsed syntactic treew is the weight vector

Introduction Difficulties Joint model Results and discussion Future work

Joint Model

Joint model

First order factorization

score tree(y , x ,w, y ′) =∑

〈h,m,lsyn,l〉∈y

score(〈h,m, lsyn, l〉 , x ,w, y ′)

x is the input sentencey is the syntactic-semantic treey ′ pre-parsed syntactic treew is the weight vectorl = lsem p1 , . . . , lsem pq

are the semantic labels for predicates pi

Introduction Difficulties Joint model Results and discussion Future work

Joint Model

Scoring

score(

〈h,m, lsyn, l〉 , x ,w, y ′)

=

syntactic score (h,m, lsyn, x ,w) +

semantic score(

h,m,lsem p1 , . . . , lsem pq , x ,w, y ′)

The score of a dependency is the syntactic score (as usual) + thesemantic score of the assigned semantic label (if any) for eachpredicate

l = lsem p1 , . . . , lsem pq

Introduction Difficulties Joint model Results and discussion Future work

Joint Model

Semantic Scoring

Semantic scoring function

semantic score(

h,m,lsem p1 , . . . , lsem pq , x ,w, y ′)

=

lsem pi

φsem (〈h,m,lsem pi〉 , pi , x , y ′) ·w(lsem pi

)

q

y ′ is the precomputed syntax tree for feature extractionlsem pi

is the semantic label of m for predicate pi

Introduction Difficulties Joint model Results and discussion Future work

Learning

Talk Overview

1 Introduction

2 Difficulties

3 Joint modelFormalizationLearning

4 Results and discussion

Introduction Difficulties Joint model Results and discussion Future work

Learning

Learning

The structured perceptron algorithm Collins, 2002

w ← 0for t ← 1 to numEpochs do

for all (x , y) ∈ training set do

y =best tree(x ,w)for all factor ∈ y \ y do

w (l) ← w (l) + φ(factor, x , y )end for

for all factor ∈ y \ y do

w (l) ← w (l) − φ(factor, x , y )end for

end for

end for

Introduction Difficulties Joint model Results and discussion Future work

Architecture

System summary

Core

Averaged perceptron learning + Eisner algorithm inferenceCollins, 2002 Eisner, 1996

and based on Carreras et al. 2006

Joint inference

Extension of the Eisner algorithm with semantic labels

Introduction Difficulties Joint model Results and discussion Future work

Architecture

System overview

System architecture

1 Preprocess

Feature extraction

2 Syntactic parsing

To allow syntactic feature extraction

3 Predicate identification

Heuristic rules + SVM

4 Joint syntactic-semantic parsing

The core of the system

5 Postprocess

Most frequent predicate sense and ILP

Introduction Difficulties Joint model Results and discussion Future work

Architecture

System overview

System architecture

1 Preprocess

Feature extraction

2 Syntactic parsing

To allow syntactic feature extraction

3 Predicate identification

Heuristic rules + SVM

4 Joint syntactic-semantic parsing

The core of the system

5 Postprocess

Most frequent predicate sense and ILP

Introduction Difficulties Joint model Results and discussion Future work

Architecture

System overview

System architecture

1 Preprocess

Feature extraction

2 Syntactic parsing

To allow syntactic feature extraction

3 Predicate identification

Heuristic rules + SVM

4 Joint syntactic-semantic parsing

The core of the system

5 Postprocess

Most frequent predicate sense and ILP

Introduction Difficulties Joint model Results and discussion Future work

Architecture

System overview

System architecture

1 Preprocess

Feature extraction

2 Syntactic parsing

To allow syntactic feature extraction

3 Predicate identification

Heuristic rules + SVM

4 Joint syntactic-semantic parsing

The core of the system

5 Postprocess

Most frequent predicate sense and ILP

Introduction Difficulties Joint model Results and discussion Future work

Architecture

System overview

System architecture

1 Preprocess

Feature extraction

2 Syntactic parsing

To allow syntactic feature extraction

3 Predicate identification

Heuristic rules + SVM

4 Joint syntactic-semantic parsing

The core of the system

5 Postprocess

Most frequent predicate sense and ILP

Introduction Difficulties Joint model Results and discussion Future work

Talk Overview

1 Introduction

2 Difficulties

3 Joint model

4 Results and discussionExperimentationCoNLL-2008 results

Introduction Difficulties Joint model Results and discussion Future work

Experimentation

Talk Overview

1 Introduction

2 Difficulties

3 Joint model

4 Results and discussionExperimentationCoNLL-2008 results

Introduction Difficulties Joint model Results and discussion Future work

Experimentation

The CoNLL-2008 shared task

Datasets

Syntactic parsing

WSJ corpus with ∼1M tokens corpus.

Semantic role labeling

Nominal and verbal predicates

Evaluation

Syntax Labeled attachment score (LAS)

Semantics Precision, Recall, F1

Global Macro averaged F1

Introduction Difficulties Joint model Results and discussion Future work

Answers to the questions

Discussion

1 Can a joint system overcome the pipeline approach?

2 Is it feasible to build a joint parsing system?

Introduction Difficulties Joint model Results and discussion Future work

Answers to the questions

Learning curve (development)

Introduction Difficulties Joint model Results and discussion Future work

Answers to the questions

Discussion

1 Can a joint system overcome the pipeline approach?

Syntactic parsing similar resultsSemantic parsing improved by ∼5 points

Semantic results are better and improving (and hurting syntax?)

2 Is it feasible to build a joint parsing system?

Syntactic pipeline parser ∼0.2 s/sentenceJoint parser ∼0.3 s/sentenceJoint parser memory <1.5 GB

65 % of time computing the averaged perceptron.1 epoch in less than 4h. on an amd64 x2 5000+

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

Learning curve (development)

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

Discussion

About syntactic-semantic overlapping

A semantic relation highly dependent on the correct syntactichead:

will increase the correct dependency scorewill benefit the final syntax tree

A semantic relation not so dependent on the correct syntactichead:

will increase some dependencies scorescould hurt the final syntax tree

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

syntactic and semantic parsing: semantics

Semantic scoring

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

syntactic and semantic parsing: semantics

Semantic scoring

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

syntactic and semantic parsing: semantics

Semantic scoring

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

syntactic and semantic parsing: semantics

Semantic scoring

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

syntactic and semantic parsing: semantics

Semantic scoring

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

syntactic and semantic parsing: semantics

Semantic scoring

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

syntactic and semantic parsing: semantics

Semantic scoring

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

syntactic and semantic parsing: semantics

Semantic scoring

Introduction Difficulties Joint model Results and discussion Future work

Syntactic-Semantic Overlap

Discussion

Semantic scoring function

semantic score(

h,m,lsem pi, x ,w, y ′

)

=

φsem

(

h,m,pi , x , y ′)

· w(lsem pi)

Features are extracted by φsem from:

h head

m modifier

pi predicate

h,m modifier-head

m, pi modifier-predicate

h, pi head-predicate

Introduction Difficulties Joint model Results and discussion Future work

CoNLL-2008 results

Posteval results

Group Name WSJ + Brown WSJ BrownLund (*) Johansson (*) 85.49 86.61 76.34Yahoo! (*) Ciaramita (*) 82.69 83.83 73.51HIT-IR Che 82.66 83.78 73.57Hong Kong (*) Zhao (*) 82.24 83.41 72.70Geneva (*) Henderson (*) 80.48 81.53 71.93Koc Yuret 79.84 80.97 70.55GSLT ML2 Samuelsson 79.79 80.92 70.49DFKI 2 Zhang 79.32 80.41 70.48NAIST Watanabe 79.10 80.3 69.29Antwerp Morante 78.43 79.52 69.55HIT-ICR Li 78.35 79.38 70.01UPC (*) Lluıs (*) 78.11 79.16 69.84UT Austin Baldridge 77.49 78.57 68.53Koc Yatbaz 77.45 78.43 69.61USTC Chen 77.00 77.95 69.23Korea Lee 76.90 77.96 68.34Peking Sun 76.28 77.1 69.58Colorado Choi 71.23 72.22 63.44UAIC Trandabat 63.45 64.21 57.41DFKI 1 Neumann 19.93 20.13 18.14

Introduction Difficulties Joint model Results and discussion Future work

Future and Ongoing Work

1 Higher degree of joint processing

Joint predicate identificationNo previous dependency parsing

2 Higher order dependencies

3 Improvement of the semantic classifier component

4 Projectivization techniques

5 Feature engineering and system tuning

6 Alternative joint models

Introduction Difficulties Joint model Results and discussion Future work

The end

Thank you

Introduction Difficulties Joint model Results and discussion Future work

For further reading

James Henderson, Paola Merlo, Gabriele Musillo and IvanTitovA Latent Variable Model of Synchronous Parsing for Syntactic

and Semantic Dependencies.Proceedings of the CoNLL-2008, 2008.

Gabriele Musillo and Paola MerloRobust Parsing for the Proposition Bank.In Proceedings of EACL’06 Workshop: Robust Methods inAnalysis of Natural Language Data, 2006.

Xavier Lluıs and Lluıs MarquezA Joint Model for Parsing Syntactic and Semantic

Dependencies.Proceedings of the CoNLL-2008, 2008.