Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing...

Universal Dependency Parsing

Joakim Nivre !

Uppsala University Department of Linguistics and Philology

The Grand Vision

The Grand Vision

The problem • Why 90% parsing accuracy for English but only 80% for Finnish?

• Are some languages intrinsically harder to parse?

• Not just morphological richness – many typological parameters

The Grand Vision




An ideal solution • A universal parser for all languages

• Linguistic universals are hard-coded

• Typological parameters are learned from data

The Grand Vision




An ideal solution • A universal parser for all languages

• Linguistic universals are hard-coded

• Typological parameters are learned from data

Gee! What a neat idea!

Pieces of the Puzzle

1. Morphosyntactic disambiguation

2. Universal dependency annotation

3. Parsing with universal dependencies

Part I

Morphosyntactic Disambiguation

Joint work with Bernd Bohnet, Igor Boguslavsky, Richárd Farkas, Filip Ginter and Jan Hajič

Background

Parsing accuracy for morphologically rich languages tends to be lower than for languages like English (Nivre et al., 2007)

Hypothesized explanations (Tsarfaty et al. 2010, 2013): • Strict separation morphology-syntax

• Data sparsity due to high type-token ratio

Suggested remedies • Joint morphological and syntactic analysis (Lee et al., 2011)

• Lexical resources (Hajič, 2000; Goldberg and Elhadad, 2013)

Background

This StudyThis StudyParsing techniques:

• Transition-based model for joint morphological and syntactic analysis

• Lexical resources integrated as hard or soft constraints

Evaluation on five morphologically rich languages: • Czech, Finnish, German, Hungarian, Russian

• New state of the art in dependency parsing for all languages

Bernd Bohnet, Joakim Nivre, Igor Boguslavsky, Richárd Farkas, Filip Ginter, and Jan Hajič. 2013. Joint Morphological and Syntactic Analysis for Richly Inflected Languages. Transactions of the Association for Computational Linguistics 1:415–428

RepresentationsRepresentations

For each word of a sentence w1, …, wn:

Representations

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7

Representations

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P

Representations



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7

Representations

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P• A morphological feature bundle m ∈ M

Representations



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7

Representations

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P• A morphological feature bundle m ∈ M• A lemma l ∈ Z*

Representations



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7

Representations

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P• A morphological feature bundle m ∈ M• A lemma l ∈ Z* • A syntactic head h ∈ {0, wi, …, wn}

Representations



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7

Representations

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P• A morphological feature bundle m ∈ M• A lemma l ∈ Z* • A syntactic head h ∈ {0, wi, …, wn}• A dependency label d ∈ D

Representations



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7



NK

OA

SB

MO

NK

OC

—



NK

OA

SB

MO

NK

OC

—

7

Representations

Parsing Framework

Transition system (Bohnet and Nivre, 2012): • Arc-standard system with online reordering (Nivre, 2009)

• Select morphology when shifting words to the stack

Beam search and structured learning (Zhang and Clark, 2008)

Preprocessing (at learning and parsing time) • Tagger assigns k best tags and feature bundles

• Parser can only select analyses licensed by preprocessor

Experiment 1Preprocessor assigns a single tag and feature bundle per word

Beam = 40 distinct trees

!Preprocessor assigns up to 2 tags and feature bundles per word

Beam = 40 distinct trees + 8 tag variants + 8 feature variants

Pipeline

Treebank Train Dev Test P M D

cs PDT (Hajič et al., 2001) 1249K 88K 70K 12 1851 49

de Tiger (Brants et al., 2002) 648K 32K 32K 54 257 43

fi TDT (Haverinen et al., 2013) 183K 21K 12 1917 47

hu Szeged (Farkas et al., 2012) 1101K 210K 171K 22 1105 33

ru SynTagRus (Boguslavsky et al., 2000) 575K 73K 72K 14 454 78

Joint

10

Morphological Accuracy

85

90

95

100

cs de fi hu ru

92.8

96.2

89.190.0

93.7

92.6

96.1

88.889.1

93.0

Pipeline Joint

11

Syntactic Accuracy

75

80

85

90

95

100

cs de fi hu ru

87.688.9

80.6

92.0

83.3

87.488.4

79.9

91.8

83.1

Pipeline Joint

Experiment 2Morphological lexicons

• Lookup of tag, feature bundle and lemma

• Added as hard or soft constraints to preprocessor and parser

• Lemma selected deterministically from form + tag + feature bundle

Lexicon

cs PDT (Hajič and Hladká, 1998)

de SMOR (Schmid et al., 2004)

fi OMorFi (Pirinen, 2011)

hu morphdb.hu (Trón et al., 2006)

ru ETAP-3 (Apresian et al., 2003)

POS MOR

0

1

2

3

4+

POS MOR

0

1

2

3

4+

cs fi de hu ru

13


80

85

90

95

100

cs de fi hu ru

95.1

97.4

91.691.2

94.4 94.5

97.1

91.1

92.9 92.8

96.2

89.190.0

93.7

Joint LexHard LexSoft

13


80

85

90

95

100

cs de fi hu ru

95.1

97.4

91.691.2

94.4 94.5

97.1

91.1

92.9 92.8

96.2

89.190.0

93.7


?

14

Syntactic Accuracy

75

80

85

90

95

100

cs de fi hu ru

87.789.1

82.3

92.1

83.5

88.089.1

82.5

92.0

83.4

87.688.9

80.6

92.0

83.3


Discussion

Joint inference benefits both morphology and syntax • Strongest effect on morphology for Czech and German – syncretism?

• Strongest effect on syntax for Finnish and Hungarian – why?

Lexical resources mitigate data sparseness • Strongest effect on morphology – soft constraints always best?

• Weaker effect on syntax – relevant errors fixed by joint inference?

• Strong effect on syntax for Finnish – sparse data?

Discussion

Joint inference benefits both morphology and syntax • Strongest effect on morphology for Czech and German – syncretism?

• Strongest effect on syntax for Finnish and Hungarian – why?

Lexical resources mitigate data sparseness • Strongest effect on morphology – soft constraints always best?

• Weaker effect on syntax – relevant errors fixed by joint inference?

• Strong effect on syntax for Finnish – sparse data?

Nice! But why only 82%

for Finnish?

Can we even compare the

numbers?

Part 2

Universal Dependency Annotation

Joint work with Ryan McDonald, Slav Petrov, Chris Manning, Marie de Marneffe, Jinho Choi, Filip Ginter, Yoav Goldberg, Jan Hajič and Reut Tsarfaty

Apples and Oranges

Apples and Oranges

Treebank annotation schemes vary across languages • Hard to compare parsing results across languages (Nivre et al., 2007)

• Hard to evaluate cross-lingual learning (McDonald et al., 2013)

• Hard to make progress towards a universal parser?

Apples and Oranges

Treebank annotation schemes vary across languages • Hard to compare parsing results across languages (Nivre et al., 2007)

• Hard to evaluate cross-lingual learning (McDonald et al., 2013)

• Hard to make progress towards a universal parser?

Recent initiatives: • HamleDT: Conversion of 29 existing treebanks to a PDT-like

annotation scheme (Zeman et al., 2012)

• Universal Dependency Treebank Project: New annotation, conversion and harmonization to Google universal PoS tags and Stanford dependencies (McDonald et al., 2013)

Delexicalized transfer parsing with universal PoS tags

Case Study

SourceTraining

Language

Target Test LanguageUnlabeled Attachment Score (UAS) Labeled Attachment Score (LAS)Germanic Romance Germanic Romance

DE EN SV ES FR KO DE EN SV ES FR KODE 74.86 55.05 65.89 60.65 62.18 40.59 64.84 47.09 53.57 48.14 49.59 27.73EN 58.50 83.33 70.56 68.07 70.14 42.37 48.11 78.54 57.04 56.86 58.20 26.65SV 61.25 61.20 80.01 67.50 67.69 36.95 52.19 49.71 70.90 54.72 54.96 19.64ES 55.39 58.56 66.84 78.46 75.12 30.25 45.52 47.87 53.09 70.29 63.65 16.54FR 55.05 59.02 65.05 72.30 81.44 35.79 45.96 47.41 52.25 62.56 73.37 20.84KO 33.04 32.20 27.62 26.91 29.35 71.22 26.36 21.81 18.12 18.63 19.52 55.85

Table 3: Cross-lingual transfer parsing results. Bolded are the best per target cross-lingual result.

mon to both studies (DE, EN, SV and ES). In thatstudy, UAS was in the 38–68% range, as comparedto 55–75% here. For Swedish, we can even mea-sure the difference exactly, because the test setsare the same, and we see an increase from 58.3%to 70.6%. This suggests that most cross-lingualparsing studies have underestimated accuracies.

4 Conclusion

We have released data sets for six languages withconsistent dependency annotation. After the ini-tial release, we will continue to annotate data inmore languages as well as investigate further au-tomatic treebank conversions. This may also leadto modifications of the annotation scheme, whichshould be regarded as preliminary at this point.Specifically, with more typologically and morpho-logically diverse languages being added to the col-lection, it may be advisable to consistently en-force the principle that content words take func-tion words as dependents, which is currently vi-olated in the analysis of adpositional and copulaconstructions. This will ensure a consistent analy-sis of functional elements that in some languagesare not realized as free words or are not obliga-tory, such as adpositions which are often absentdue to case inflections in languages like Finnish. Itwill also allow the inclusion of language-specificfunctional or morphological markers (case mark-ers, topic markers, classifiers, etc.) at the leaves ofthe tree, where they can easily be ignored in appli-cations that require a uniform cross-lingual repre-sentation. Finally, this data is available on an opensource repository in the hope that the communitywill commit new data and make corrections to ex-isting annotations.

Acknowledgments

Many people played critical roles in the pro-cess of creating the resource. At Google, Fer-

nando Pereira, Alfred Spector, Kannan Pashu-pathy, Michael Riley and Corinna Cortes sup-ported the project and made sure it had the re-quired resources. Jennifer Bahk and Dave Orrhelped coordinate the necessary contracts. AndreaHeld, Supreet Chinnan, Elizabeth Hewitt, Tu Tsaoand Leigha Weinberg made the release processsmooth. Michael Ringgaard, Andy Golding, TerryKoo, Alexander Rush and many others providedtechnical advice. Hans Uszkoreit gave us per-mission to use a subsample of sentences from theTiger Treebank (Brants et al., 2002), the source ofthe news domain for our German data set. Anno-tations were additionally provided by Sulki Kim,Patrick McCrae, Laurent Alamarguy and HectorFernandez Alcalde.

ReferencesAlena Bohmova, Jan Hajic, Eva Hajicova, and Barbora

Hladka. 2003. The Prague Dependency Treebank:A three-level annotation scenario. In Anne Abeille,editor, Treebanks: Building and Using Parsed Cor-pora, pages 103–127. Kluwer.

Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolf-gang Lezius, and George Smith. 2002. The TIGERTreebank. In Proceedings of the Workshop on Tree-banks and Linguistic Theories.

Sabine Buchholz and Erwin Marsi. 2006. CoNLL-Xshared task on multilingual dependency parsing. InProceedings of CoNLL.

Miriam Butt, Helge Dyvik, Tracy Holloway King,Hiroshi Masuichi, and Christian Rohrer. 2002.The parallel grammar project. In Proceedings ofthe 2002 workshop on Grammar engineering andevaluation-Volume 15.

Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, andChristopher D. Manning. 2009. Discriminativereordering with Chinese grammatical relations fea-tures. In Proceedings of the Third Workshop on Syn-tax and Structure in Statistical Translation (SSST-3)at NAACL HLT 2009.

Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal Dependency Annotation for Multilingual Parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 92–97.


Case Study

SourceTraining

Language





4 Conclusion


Acknowledgments










First evaluation of labeled accuracy


Case Study

SourceTraining

Language





4 Conclusion


Acknowledgments










First evaluation of labeled accuracyResults make typological sense

Too Many Standards?

Too Many Standards?

English SD de Marneffe et al. (2006)

Too Many Standards?

Google SD McDonald et al. (2013)


Too Many Standards?


Hebrew SD Tsarfaty (2013)

Finnish SD Haverinen et al. (2013)


Too Many Standards?


CLEAR Choi (2012)

HamleDT Zeman et al. (2012)




Too Many Standards?


CLEAR Choi (2012)

HamleDT Zeman et al. (2012)




Universal Dependencies

!

Marie-Catherine De Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre and Christopher D. Manning. 2014. Universal Stanford Dependencies: a Cross-Linguistic Typology. In Proceedings of the Ninth International Conference on Language Resources and Evaluation.

Universal Dependencieshttp://universaldependencies.github.io/docs/

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1





dobjpunc

1





dobjpunc

1





dobjpunc

1

http://universaldependencies.github.io/docs/






dobjpunc

1





dobjpunc

1





dobjpunc

1





dobjpunc

1

Stanford Universal Dependencies







dobjpunc

1





dobjpunc

1





dobjpunc

1





dobjpunc

1

Google Universal Part-of-Speech Tags + Morphological Features







dobjpunc

1





dobjpunc

1





dobjpunc

1





dobjpunc

1

Explicit Mapping from Tokens to Words


Guiding Principles

Guiding Principles

Maximize parallelism • Don’t annotate the same thing in different ways

• Don’t make different things look the same

Guiding Principles

Maximize parallelism • Don’t annotate the same thing in different ways

• Don’t make different things look the same

But don’t overdo it • Don’t annotate things that are not there

• Languages select from a universal pool of categories

• Allow language-specific extensions

• Keeping content words as heads promotes parallelism

• Function words often correlate with morphology

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc


detnsubjpass

auxpasscase

det

agentpunc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc


nsubjpass caseagent

punc


nsubjpass caseagent

punc

2

Dependency Structure


detnsubjpass

auxpasscase

det

agentpunc


detnsubjpass

auxpasscase

det

agentpunc


nsubjpass caseagent

punc


nsubjpass caseagent

punc


nsubjpass caseagent

punc

2

• Keeping content words as heads promotes parallelism

• Function words often correlate with morphology


detnsubjpass

auxpasscase

det

agentpunc


detnsubjpass

auxpasscase

det

agentpunc


nsubjpass caseagent

punc


nsubjpass caseagent

punc


nsubjpass caseagent

punc

2


detnsubjpass

auxpasscase

det

agentpunc


detnsubjpass

auxpasscase

det

agentpunc


nsubjpass caseagent

punc


nsubjpass caseagent

punc


nsubjpass caseagent

punc

2

Dependency Structure

Dependency Relations

• Taxonomy of 42 universal grammatical relations, broadly supported across many languages in language typology

• Language specific subtypes can be added

• The lexicalist hypothesis • Grammatical relations hold between words (including clitics)

• Morphological categories are properties of words

• Morphological annotation • Revised Google Universal Part-of-Speech Tags (Petrov et al., 2012)

• Universal inventory of morphological features (under construction)

Morphology

ADJ ADP ADV AUX CONJ DET INTJ NOUN NUM PNOUN PRON PRT PUNCT SCONJ SYM VERB X

Tokens and Words

Principle of recoverability • Clitics and contractions are split to allow meaningful annotation

• Mapping from basic tokenization is explicitly represented

• Heuristic mapping of annotation to basic tokens is provided

Encoded in revised CoNLL format (CoNLL-U)

Vamos nos a el mar .VERB PRON ADP DET NOUN .

1.1 1.2 2.1 2.2 3 4Vamonos al mar .

explcase

det

nmodpunc

Vamonos al mar .VERB PRON ADP DET NOUN .

Vamos nos a el mar .

explcase

det

nmodpunc

3

Tokens and Words






1.1 1.2 2.1 2.2 3 4Vamonos al mar .

explcase

det

nmodpunc



explcase

det

nmodpunc


explcase

det

nmodpunc

3

Tokens and Words






1.1 1.2 2.1 2.2 3 4Vamonos al mar .

explcase

det

nmodpunc



explcase

det

nmodpunc

3

Work in Progress

Current time plan: • Stable annotation guidelines by end of September

• First release of data sets before the end of 2014

Follow our progress and give feedback: • Universal Dependencies: http://universaldependencies.github.io/docs/

Check out old releases: • Uni-Dep-TB: https://code.google.com/p/uni-dep-tb/

Let’s work together!


https://code.google.com/p/uni-dep-tb/

Work in Progress

Current time plan: • Stable annotation guidelines by end of September

• First release of data sets before the end of 2014

Follow our progress and give feedback: • Universal Dependencies: http://universaldependencies.github.io/docs/

Check out old releases: • Uni-Dep-TB: https://code.google.com/p/uni-dep-tb/

Let’s work together!

Hey, wait a minute!

How is this going to improve

parsing?


https://code.google.com/p/uni-dep-tb/

Part 3

Parsing with Universal Dependencies

Confessions of a converted dependency parser

Taking Stock

Taking Stock

Motivation for universal dependencies

• Improve comparability of parsing results across languages

• Facilitate development of multilingual systems

• Enable typological studies of syntactic structure

Taking Stock

Motivation for universal dependencies

• Improve comparability of parsing results across languages

• Facilitate development of multilingual systems

• Enable typological studies of syntactic structure

What about parsing?

• Not likely to improve parsing accuracy with existing parsers

• Parsers tend to prefer function words as heads (Schwarz et al., 2012)

• We risk bringing English down to 80% instead of Finnish up to 90%

What’s the Problem?

• Dependency parsers know only one syntactic relation


h d


• They do not interpret dependency labels


h d

?


• They do not interpret dependency labels

• They represent a construction primarily by its head


h d

?h

A Simple Case

The dog was chased by the black cat.

• All criteria point to cat being the head

The dog was chased by the .black cat

amod

A Simple Case

• All criteria point to cat being the head

• Little (syntactic) information is lost by dropping black

black cat

amodThe dog was chased by the .

cat

A Simple Case


A Tricky Case

The dog by the black cat.was chased

?

• Some criteria point to was, others to chased as the head

A Tricky Case

was chased

aux

chased

chasedwas

main

was

The dog by the black cat.


• Neither word can represent the whole

A Tricky Case


• Neither word can represent the whole

• The same problem arises with by (the black cat)

A Tricky Case


• Three fundamental syntactic relations (Tesnière, 1959)

• Tesnière-style dependency treebank (Sangati and Mazza, 2009)

Back to the Roots

cat

black

Dependency single head

was chased

Transfer split head

cats and mice

Coordination multiple heads


• The UD representation is a simple dependency tree

Fido was chased by Kitty and Tiger .PNOUN AUX VERB ADP PNOUN CONJ PNOUN .

nsubjpassauxpass case

agent conjcc

punc



agent conjcc

punc

4



• But labels are universal and can be interpreted



agent conjcc

punc



agent conjcc

punc

4



• But labels are universal and can be interpreted

• Therefore, we can put more knowledge into the parser

was chased

Fido by Kitty and Tiger

Final Remarks

Final Remarks

Constituency parsing – largely driven by PTB • Perhaps too much emphasis on English (until recently)

• But deep analysis of categories and representations

Final Remarks



Dependency parsing – largely driven by CoNLL data • More attention to typological diversity

• But ignorance and blind faith with respect to representations

Final Remarks



Dependency parsing – largely driven by CoNLL data • More attention to typological diversity

• But ignorance and blind faith with respect to representations

Can UD give us the best of both worlds?

Thanks for Your Attention! Questions?

http://stp.lingfil.uu.se/~nivre/

Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing...

Documents

Transcript of Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing...