Joint Learning of Syntactic and Semantic Dependencies · Introduction Difficulties Joint model...
Transcript of Joint Learning of Syntactic and Semantic Dependencies · Introduction Difficulties Joint model...
Introduction Difficulties Joint model Results and discussion Future work
Joint Learning of Syntactic and SemanticDependencies
Xavier Lluıs and Lluıs Marquez
Barcelona, 10 de setembre de 2008
Introduction Difficulties Joint model Results and discussion Future work
Introduction
The CoNLL-2008 shared task
Syntactic dependency parsing
Predicate identification and disambiguation
Semantic role labeling
Introduction Difficulties Joint model Results and discussion Future work
1 Introduction
2 Difficulties
3 Joint model
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Talk Overview
1 IntroductionSyntactic and semantic parsingDesign a joint model
2 Difficulties
3 Joint model
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
Talk Overview
1 IntroductionSyntactic and semantic parsingDesign a joint model
2 Difficulties
3 Joint model
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
syntactic and semantic parsing
A sample sentence
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
syntactic and semantic parsing
Syntactic dependencies
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
syntactic and semantic parsing: semantics
Predicate completed
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
syntactic and semantic parsing: semantics
Semantic dependencies for completed
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
syntactic and semantic parsing: semantics
Predicate acquisition
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
syntactic and semantic parsing: semantics
Semantic dependencies for acquisition
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
syntactic and semantic parsing: semantics
Predicate announcedq
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
syntactic and semantic parsing: semantics
Semantic dependencies for announced
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
syntactic and semantic parsing: semantics
Semantic dependencies for all predicates
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
Mainstream approach
The pipeline approach
1 Syntactic parsing
A parser (Eisner, Shift-reduce)
2 Semantic role labeling
A simpler (non-structured) classifier
⇒ ⇒
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic parsing
Pipeline strategy
The pipeline approach
1 Propagation or amplification of errors
2 Assumes an order of increasing difficulty
3 Dependencies between layers are hard to be captured
Introduction Difficulties Joint model Results and discussion Future work
The joint approach
Joint approach
Design a joint model
One single step for syntax + semantics
Musillo and Merlo (2006) used extended tags (e.g.,“NMODA0”)
Henderson et. al (2008) extended shift-reduce parsers
Introduction Difficulties Joint model Results and discussion Future work
The joint approach
Joint approach
Design a joint model
1 Can a joint system overcome the pipeline approach?
2 Is it feasible to build a joint parsing system?
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
Talk Overview
1 IntroductionSyntactic and semantic parsingDesign a joint model
2 Difficulties
3 Joint model
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
Design a joint model
A joint approach
Extend a syntactic parsing model to jointly parse semantics
1 Syntactic parsing
A parser (Eisner, Shift-reduce)
2 Semantic role labeling
A simpler (non-structured) classifier
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
Design a joint model
A joint approach
Extend the Eisner algorithm jointly parse semantics
O(n3) algorithm
Based on CKY algorithm
Bottom-up parser
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Bottom-up dependency parsing
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Bottom-up dependency parsing
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Bottom-up dependency parsing
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Bottom-up dependency parsing
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Bottom-up dependency parsing
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Bottom-up dependency parsing
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Bottom-up dependency parsing
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Score of a dependency
A dependency d = 〈h,m, l〉 of a sentence x is scored by:
score(d , x) = φ (〈h,m, l〉 , x) ·w
where φ is a feature extraction function,w is a weight vector
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Score of a tree
A syntactic tree y for a sentence x is scored by:
score tree(y , x) =∑
〈h,m,l〉∈y
score (〈h,m, l〉 , y)
Arc-factorization
The sum of independent scores for each dependency of the tree iscalled the first order factorization.
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
Best tree
We are interested in the best scoring tree among all trees Y(x):
best tree(x) = argmaxy∈Y(x)
score tree(y , x)
Eisner algorithm
The Eisner algorithm is an exact search algorithm that computesthe best first-order factorized tree.
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
The Eisner algorithm
A non-projective dependency graph
Introduction Difficulties Joint model Results and discussion Future work
Design a joint model
Joint Approach Difficulties
Joint parsing point of view
Joint parsing implies the prediction of the semantic labelsimultaneously to the syntactic label
Introduction Difficulties Joint model Results and discussion Future work
Talk Overview
1 Introduction
2 DifficultiesSyntactic and semantic overlapUnavailable features
3 Joint model
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Talk Overview
1 Introduction
2 DifficultiesSyntactic and semantic overlapUnavailable features
3 Joint model
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Joint Approach Difficulties: an example
Syntactic and semantic overlap
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Joint Approach Difficulties: an example
Syntactic and semantic overlap
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Joint Approach Difficulties: an example
Syntactic and semantic overlap
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Joint Approach Difficulties: an example
Syntactic and semantic overlap
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Syntax and Semantics overlapping
1. Are syntax and semantics overlapping?
36.4% of argument-predicate relations do not exactly overlapwith modifier-head syntactic relations.
Proposed solution
Bound the semantic label to the syntactic dependency
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Difficulties: non-overlapping semantics
Solution to overlapping dependencies
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Difficulties: non-overlapping semantics
Solution to overlapping dependencies
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Difficulties: non-overlapping semantics
Solution to overlapping dependencies
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Difficulties: non-overlapping semantics
Solution to overlapping dependencies
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Difficulties: non-overlapping semantics
Solution to overlapping dependencies
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Difficulties: non-overlapping semantics
Solution to overlapping dependencies
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Difficulties: non-overlapping semantics
Solution
An extended dependency is:
d =⟨
h,m, lsyn, lsem p1 , . . . , lsem pq
⟩
h is the headm the modifierlsyn the syntactic labellsem pi
one semantic label for each sentence predicate pi
Introduction Difficulties Joint model Results and discussion Future work
Syntactic and semantic overlap
Proposed solution
OBJ, _, _, Su
OBJ, A1, A1, _
AMOD, _, AM−TMP, _
NMOD, _, _, _
NMOD, _, _, _SBJ, A0, _, A0
A dependency has its syntactic and semantic labels
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Talk Overview
1 Introduction
2 DifficultiesSyntactic and semantic overlapUnavailable features
3 Joint model
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Proposed solution
OBJ, _, _, Su
OBJ, A1, A1, _
AMOD, _, AM−TMP, _
NMOD, _, _, _
NMOD, _, _, _SBJ, A0, _, A0
Unavailable features for distant arguments
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Proposed solution
Unavailable features for distant arguments
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Proposed solution
Unavailable features for distant arguments
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Proposed solution
Unavailable features for distant arguments
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Proposed solution
Unavailable features for distant arguments
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Proposed solution
Unavailable features for distant arguments
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Proposed solution
Unavailable features for distant arguments
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Proposed solution
Unavailable features for distant arguments
Introduction Difficulties Joint model Results and discussion Future work
Unavailable features
Problems inherited from traditional pipeline design
2. Problems not appearing in pipeline systems
State-of-the-art SRL systems strongly rely on syntactic pathfeatures.
There is only a partial visibility of the syntax restricted to thecurrent sentence span.
A distant argument-predicate relation can occur.
Proposed solution
Pre-parse and extract predicate-modifier syntactic paths.
Introduction Difficulties Joint model Results and discussion Future work
Talk Overview
1 Introduction
2 Difficulties
3 Joint modelFormalizationLearning
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Formalization
Talk Overview
1 Introduction
2 Difficulties
3 Joint modelFormalizationLearning
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Joint Model
Joint Model
The Joint Model extends and it is based on the first ordersyntactic model
Best joint tree
best tree(x ,w, y ′) = argmaxy∈Y(x)
score tree(y , x ,w, y ′)
argmax computed using the Eisner algorithm
x is the input sentencey is the syntactic-semantic treey ′ pre-parsed syntactic treew is the weight vector
Introduction Difficulties Joint model Results and discussion Future work
Joint Model
Joint model
First order factorization
score tree(y , x ,w, y ′) =∑
〈h,m,lsyn,l〉∈y
score(〈h,m, lsyn, l〉 , x ,w, y ′)
x is the input sentencey is the syntactic-semantic treey ′ pre-parsed syntactic treew is the weight vectorl = lsem p1 , . . . , lsem pq
are the semantic labels for predicates pi
Introduction Difficulties Joint model Results and discussion Future work
Joint Model
Scoring
score(
〈h,m, lsyn, l〉 , x ,w, y ′)
=
syntactic score (h,m, lsyn, x ,w) +
semantic score(
h,m,lsem p1 , . . . , lsem pq , x ,w, y ′)
The score of a dependency is the syntactic score (as usual) + thesemantic score of the assigned semantic label (if any) for eachpredicate
l = lsem p1 , . . . , lsem pq
Introduction Difficulties Joint model Results and discussion Future work
Joint Model
Semantic Scoring
Semantic scoring function
semantic score(
h,m,lsem p1 , . . . , lsem pq , x ,w, y ′)
=
∑
lsem pi
φsem (〈h,m,lsem pi〉 , pi , x , y ′) ·w(lsem pi
)
q
y ′ is the precomputed syntax tree for feature extractionlsem pi
is the semantic label of m for predicate pi
Introduction Difficulties Joint model Results and discussion Future work
Learning
Talk Overview
1 Introduction
2 Difficulties
3 Joint modelFormalizationLearning
4 Results and discussion
Introduction Difficulties Joint model Results and discussion Future work
Learning
Learning
The structured perceptron algorithm Collins, 2002
w ← 0for t ← 1 to numEpochs do
for all (x , y) ∈ training set do
y =best tree(x ,w)for all factor ∈ y \ y do
w (l) ← w (l) + φ(factor, x , y )end for
for all factor ∈ y \ y do
w (l) ← w (l) − φ(factor, x , y )end for
end for
end for
Introduction Difficulties Joint model Results and discussion Future work
Architecture
System summary
Core
Averaged perceptron learning + Eisner algorithm inferenceCollins, 2002 Eisner, 1996
and based on Carreras et al. 2006
Joint inference
Extension of the Eisner algorithm with semantic labels
Introduction Difficulties Joint model Results and discussion Future work
Architecture
System overview
System architecture
1 Preprocess
Feature extraction
2 Syntactic parsing
To allow syntactic feature extraction
3 Predicate identification
Heuristic rules + SVM
4 Joint syntactic-semantic parsing
The core of the system
5 Postprocess
Most frequent predicate sense and ILP
Introduction Difficulties Joint model Results and discussion Future work
Architecture
System overview
System architecture
1 Preprocess
Feature extraction
2 Syntactic parsing
To allow syntactic feature extraction
3 Predicate identification
Heuristic rules + SVM
4 Joint syntactic-semantic parsing
The core of the system
5 Postprocess
Most frequent predicate sense and ILP
Introduction Difficulties Joint model Results and discussion Future work
Architecture
System overview
System architecture
1 Preprocess
Feature extraction
2 Syntactic parsing
To allow syntactic feature extraction
3 Predicate identification
Heuristic rules + SVM
4 Joint syntactic-semantic parsing
The core of the system
5 Postprocess
Most frequent predicate sense and ILP
Introduction Difficulties Joint model Results and discussion Future work
Architecture
System overview
System architecture
1 Preprocess
Feature extraction
2 Syntactic parsing
To allow syntactic feature extraction
3 Predicate identification
Heuristic rules + SVM
4 Joint syntactic-semantic parsing
The core of the system
5 Postprocess
Most frequent predicate sense and ILP
Introduction Difficulties Joint model Results and discussion Future work
Architecture
System overview
System architecture
1 Preprocess
Feature extraction
2 Syntactic parsing
To allow syntactic feature extraction
3 Predicate identification
Heuristic rules + SVM
4 Joint syntactic-semantic parsing
The core of the system
5 Postprocess
Most frequent predicate sense and ILP
Introduction Difficulties Joint model Results and discussion Future work
Talk Overview
1 Introduction
2 Difficulties
3 Joint model
4 Results and discussionExperimentationCoNLL-2008 results
Introduction Difficulties Joint model Results and discussion Future work
Experimentation
Talk Overview
1 Introduction
2 Difficulties
3 Joint model
4 Results and discussionExperimentationCoNLL-2008 results
Introduction Difficulties Joint model Results and discussion Future work
Experimentation
The CoNLL-2008 shared task
Datasets
Syntactic parsing
WSJ corpus with ∼1M tokens corpus.
Semantic role labeling
Nominal and verbal predicates
Evaluation
Syntax Labeled attachment score (LAS)
Semantics Precision, Recall, F1
Global Macro averaged F1
Introduction Difficulties Joint model Results and discussion Future work
Answers to the questions
Discussion
1 Can a joint system overcome the pipeline approach?
2 Is it feasible to build a joint parsing system?
Introduction Difficulties Joint model Results and discussion Future work
Answers to the questions
Learning curve (development)
Introduction Difficulties Joint model Results and discussion Future work
Answers to the questions
Discussion
1 Can a joint system overcome the pipeline approach?
Syntactic parsing similar resultsSemantic parsing improved by ∼5 points
Semantic results are better and improving (and hurting syntax?)
2 Is it feasible to build a joint parsing system?
Syntactic pipeline parser ∼0.2 s/sentenceJoint parser ∼0.3 s/sentenceJoint parser memory <1.5 GB
65 % of time computing the averaged perceptron.1 epoch in less than 4h. on an amd64 x2 5000+
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
Learning curve (development)
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
Discussion
About syntactic-semantic overlapping
A semantic relation highly dependent on the correct syntactichead:
will increase the correct dependency scorewill benefit the final syntax tree
A semantic relation not so dependent on the correct syntactichead:
will increase some dependencies scorescould hurt the final syntax tree
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
syntactic and semantic parsing: semantics
Semantic scoring
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
syntactic and semantic parsing: semantics
Semantic scoring
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
syntactic and semantic parsing: semantics
Semantic scoring
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
syntactic and semantic parsing: semantics
Semantic scoring
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
syntactic and semantic parsing: semantics
Semantic scoring
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
syntactic and semantic parsing: semantics
Semantic scoring
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
syntactic and semantic parsing: semantics
Semantic scoring
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
syntactic and semantic parsing: semantics
Semantic scoring
Introduction Difficulties Joint model Results and discussion Future work
Syntactic-Semantic Overlap
Discussion
Semantic scoring function
semantic score(
h,m,lsem pi, x ,w, y ′
)
=
φsem
(
h,m,pi , x , y ′)
· w(lsem pi)
Features are extracted by φsem from:
h head
m modifier
pi predicate
h,m modifier-head
m, pi modifier-predicate
h, pi head-predicate
Introduction Difficulties Joint model Results and discussion Future work
CoNLL-2008 results
Posteval results
Group Name WSJ + Brown WSJ BrownLund (*) Johansson (*) 85.49 86.61 76.34Yahoo! (*) Ciaramita (*) 82.69 83.83 73.51HIT-IR Che 82.66 83.78 73.57Hong Kong (*) Zhao (*) 82.24 83.41 72.70Geneva (*) Henderson (*) 80.48 81.53 71.93Koc Yuret 79.84 80.97 70.55GSLT ML2 Samuelsson 79.79 80.92 70.49DFKI 2 Zhang 79.32 80.41 70.48NAIST Watanabe 79.10 80.3 69.29Antwerp Morante 78.43 79.52 69.55HIT-ICR Li 78.35 79.38 70.01UPC (*) Lluıs (*) 78.11 79.16 69.84UT Austin Baldridge 77.49 78.57 68.53Koc Yatbaz 77.45 78.43 69.61USTC Chen 77.00 77.95 69.23Korea Lee 76.90 77.96 68.34Peking Sun 76.28 77.1 69.58Colorado Choi 71.23 72.22 63.44UAIC Trandabat 63.45 64.21 57.41DFKI 1 Neumann 19.93 20.13 18.14
Introduction Difficulties Joint model Results and discussion Future work
Future and Ongoing Work
1 Higher degree of joint processing
Joint predicate identificationNo previous dependency parsing
2 Higher order dependencies
3 Improvement of the semantic classifier component
4 Projectivization techniques
5 Feature engineering and system tuning
6 Alternative joint models
Introduction Difficulties Joint model Results and discussion Future work
The end
Thank you
Introduction Difficulties Joint model Results and discussion Future work
For further reading
James Henderson, Paola Merlo, Gabriele Musillo and IvanTitovA Latent Variable Model of Synchronous Parsing for Syntactic
and Semantic Dependencies.Proceedings of the CoNLL-2008, 2008.
Gabriele Musillo and Paola MerloRobust Parsing for the Proposition Bank.In Proceedings of EACL’06 Workshop: Robust Methods inAnalysis of Natural Language Data, 2006.
Xavier Lluıs and Lluıs MarquezA Joint Model for Parsing Syntactic and Semantic
Dependencies.Proceedings of the CoNLL-2008, 2008.